Introduction

The prevalence of obesity and metabolic syndrome is increasing, and as a sequela the prevalence of non-alcoholic fatty liver disease (NAFLD) and steatohepatitis (NASH) is also increasing. At present 30% of the US population already has NAFLD/NASH, and the prevalence is expected to rise [13]. A progression of liver fibrosis is observed in 37% of patients with NAFLD/NASH, and mortality of patients with steatohepatitis is significantly higher than in patients with simple steatosis without fibrosis [4, 5]. Therefore, the identification of liver fibrosis in patients with NAFLD/NASH is essential for the prognosis of disease progression [4, 6]. At present, liver biopsy is still the gold standard for the assessment of liver fibrosis. However, it is an invasive method associated with patients’ discomfort and in rare cases with serious complications [79]. In addition, the accuracy of liver biopsy is limited due to significant intra-and interobserver variability and sampling errors [1013].

Therefore, research has been focused on the evaluation of non-invasive methods for the assessment of liver fibrosis. The different approaches include routine haematological and biochemical tests, surrogate fibrosis markers in the blood and their algorithms [14] and recently most intensively evaluated the ultrasound-based transient elastography (FibroScan®, Echosens, Paris, France) [1517]. Transient elastography (TE) has shown excellent results for the diagnosis of severe fibrosis and cirrhosis and moderate results for the diagnosis of significant fibrosis in patients with NAFLD/NASH [1820]. However, studies have shown that BMI > 28 is an independent risk factor for a failure of the FibroScan® measurement [21]. In the largest study on transient elastography in NAFLD, successful measurement could only be obtained in 75% of obese patients with a BMI ≥30 kg/m2 [20], leaving 25% of obese patients without a non-invasive diagnosis. This problem might be solved by the development of the new obese probe for the FibroScan®. In a study with 100 obese patients with BMIs ≥30 kg/m2, 49% of patients who could not be measured with the standard probe (M-Probe) could be measured with the new obese probe (XL probe). Mean liver stiffness measurements were significantly lower using the XL probe compared with the M probe [22]. No histology was available in this study, and it remains unclear whether the diagnostic accuracy of the XL probe is comparable to that of the M probe.

The aim of the present study was to evaluate the XL probe compared with the M probe for the staging of liver fibrosis in patients with NAFLD/NASH.

Materials and methods

Patients

Fifty patients with NAFLD or NASH were enrolled consecutively between August 2008 and November 2009. All patients received transient elastography with the standard probe (M probe) and the obese probe (XL probe) on the same day of presentation. The distance between the skin and the liver capsule at the site of TE measurement was measured using conventional ultrasound. Diagnosis of NAFLD or NASH was made histologically by liver biopsy. As the mean progression rate of liver fibrosis in untreated patients was estimated to be 0.085–0.120 fibrosis stages on the Metavir scoring system per year [23], a time interval between liver biopsy and study inclusion of up to 18 months was accepted for enrolment in the present study. The time interval between liver biopsy and study inclusion ranged from 0 to 18 months (median 5.5 months, mean 7.9 ± 6.2 months). Men with alcohol consumption of more than 30 g of alcohol per week and women with more than 20 g of alcohol per week were excluded from the study. In addition, patients with other causes of liver disease (positive hepatitis B surface antigen or anti-hepatitis C virus antibody, positive autoantibodies) or histological evidence of other concomitant chronic liver diseases were excluded.

Characteristics of the patients and biochemical values are shown in Table 1.

Table 1 Patients’ characteristics

The present study was performed in accordance with the ethical guidelines of the Helsinki Declaration and was approved by the local ethics committee. Informed consent was received from all patients.

Liver histology

Liver biopsy specimens were fixed in 4% buffered formalin and embedded in paraffin. Two-micrometre-thick sections were stained with haematoxylin-eosin, Perl's iron stain, dPAS (periodic acid Schiff after digestion with diastase) and Masson trichrome. All biopsy specimens were analysed by an experienced pathologist (S.K.) who was blinded to the clinical results of the patients. Histological scoring was performed according to Kleiner et al. [24]. Steatosis was assessed according the number of hepatocytes with fatty degeneration: S0 ≤ 5%, S1 = 5–33%, S2 ≥ 33–66% and S3 ≥ 66% of hepatocytes. Liver fibrosis was staged on a F0–F4 scale according to Kleiner: F0, no fibrosis; F1, perisinusoidal or periportal fibrosis; F2, perisinusoidal and portal/periporal fibrosis; F3, bridging fibrosis; F4, cirrhosis. The NAFLD Activity Score (NAS) was calculated according to Kleiner from the unweighted sum of the scores of steatosis (0–3), lobular inflammation (0–3) and ballooning (0–2). The biopsies were judged to be adequate if the number of portal tracts was at least 6 and the length of liver biopsy at least 1 cm. The mean length of the included liver biopsies was 21.5 ± 8.0 mm (median 20.0 mm, range 10–40 mm).

Transient elastography

Before transient elastography (TE) measurement the distance between the skin and the liver capsule at the site of the planned TE measurement was measured using a 7.5-MHz linear ultrasound transducer (Hitachi, EUB-900, Hitachi, Tokyo, Japan).

FibroScan® (Echosens, Paris, France) is a medical device based on TE. It is equipped with a probe including an ultrasonic transducer mounted on the axis of a vibrator. A vibration transmitted from the vibrator towards the tissue induces an elastic shear wave that propagates through the tissue. These propagations are followed by pulse-echo ultrasound acquisitions, and their velocity is measured, which is directly related to tissue stiffness. Results are expressed in kilopascal. Details have been described in previous studies [25]. The XL probe used in this study was an early prototype version of the probe now commercialised and was loaned by the device’s manufacturer. The standard probe (M probe) and the XL prototype probe differ in the tip diameter (9 mm vs. 13 mm), the ultrasound central frequency (3.5 MHz vs. 1.75 MHz), the vibration amplitude (2 mm vs. 3 mm) and the measurement depth (2.5–6.5 cm for the M probe vs. 3.5–7.5 cm for the XL probe).

The examination was performed on the right lobe of the liver through the intercostal space in all patients. After the area of measurement was located, the examiner pressed the button of the probe to start the acquisition. Ten successful acquisitions were performed on each patient with both probes. The success rate was automatically calculated by the machine as the ratio of the number of successful acquisitions over the total number of acquisitions. Only TE results obtained with ten valid measurements, with a success rate of at least 60% and an interquartile range (IQR) ≤ 30% of the median, were considered reliable. FibroScan failure is defined when less than ten valid measurements are obtained. Time duration of TE measurement with each probe was documented.

Statistical analysis

Statistical analysis was performed using SigmaPlot and SigmaStat for Windows (version 11.0, Systat Software, Inc. Germany) and BiAS for Windows (version 9.04, epsilon 2009, Frankfurt, Germany). Correlations were assessed by Spearman’s correlation coefficient. Clinical and laboratory characteristics of patients were expressed as the mean ± SD, median and range. Characteristics of measurements with both probes were compared using the Wilcoxon-Mann-Whitney paired t-test. A p value less than 0.05 was judged to be statistically significant. The Bland-Altman regression was used to assess differences of the median measurements with both probes. The results of both probes were illustrated as the median and 25th to 75th percentile values (box plot). McNemar’s test was used to compare the number of patients with 0, ≥5 and ≥10 valid measurements for both probes. The diagnostic performance of TE with both probes was assessed by receiver-operating-characteristic (ROC) curves. The ROC curve represents sensitivity versus 1-specificity for all possible cutoff values for the prediction of the different fibrosis stages, respectively. The areas under the ROC curves (AUROC) as well as 95% CI of AUROC were calculated including all patients. AUROC values for different diagnostic criteria for the same data set were compared with the non-parametric DeLong test. Note that AUROC values for the different methods are correlated and that this test accounts for such correlations. Therefore, it may find significant differences in diagnostic accuracy even when confidence intervals of the single AUROC values, which ignore these correlations, are overlapping. In the event of diagnosing fibrosis stages ≥2 versus stages <2, we also calculated the differences between mean advanced versus mean non-advanced fibrosis stages (DANA)-adjusted AUROC according to Poynard et al. [26] for a standardised DANA value of 2.5. For the AUROC comparison of both probes also results independent of success rate and IQR were included in the analysis.

Results

Fifty patients with NALFD/NASH were included in the analysis. Patients’ characteristics at the time point of study inclusion are shown in Table 1. The Spearman correlation coefficient between TE with the M probe and TE with the XL probe and the different histological fibrosis stages was 0.46 (p < 0.005) and 0.54 (p < 0.0001), respectively. The correlation coefficient of the median measurements with both probes was 0.71 (p < 0.0001). The Spearman correlation coefficient between TE with the M probe and TE with the XL probe and the NAFLD activity score (NAS) was 0.53 (p < 0.001) and 0.42 (p < 0.005), respectively. In addition, a significant correlation of 0.44 (p < 0.005) was found for histological fibrosis and NAS. No significant correlation was found for histological steatosis grade with TE measurement with both probes (p = 0.06 and p = 0.53) and for steatosis grade with fibrosis stage (p = 0.66).

The diagnostic accuracy (AUROC) for the diagnosis of significant fibrosis (F ≥ 2) for the M probe and the XL probe was 0.80 and 0.82, respectively. The difference was not statistically significant (p = 0.68). The AUROC for the diagnosis of severe fibrosis (F ≥ 3) was 0.75 and 0.84, respectively (p = 0.22). The AUROC for the diagnosis of liver cirrhosis was 0.91 and 0.95, respectively (p = 0.28). Details are shown in Table 2 and Fig. 1.

Table 2 Area under the ROC curve (95% confidence interval) for transient elastography with the M probe and the XL probe according to Kleiner’s fibrosis stage
Fig. 1
figure 1

Receiver-operating characteristic (ROC) curves for TE with the M probe and the XL probe for diagnosis of significant fibrosis (F ≥ 2)

The AUROC for the diagnosis of steatohepatitis using the NAFLD Activity Score (NAS ≥ 5) for the M probe and the XL probe was 0.79 (95%-CI: 0.66–0.92) and 0.74 (95%-CI: 0.59–0.88), respectively. The difference was not statistically significant (p = 0.41).

The number of valid measurements was significantly higher for the XL probe compared with the M probe (9.8 vs. 8.8, p < 0.05). While in patients with a distance between the skin and the liver capsule of ≤25 mm no significant difference in the number of TE failures (<10 valid measurements) was observed between the M probe and the XL probe (1 vs 0, p = 0.32), a significant difference was found in patients with a distance between the skin and the liver capsule of >25 mm (6 vs. 1, p < 0.05). The mean time duration of TE measurement was 3.4 ± 1.4 min (median: 3.0 min; range: 2.1–10 min) for the M probe, and 3.12 ± 2.1 min (median: 2.2 min; range: 1.5–12 min). The difference was statistically significant (p < 0.05).

The median liver stiffness measurement using the XL probe was significantly lower than when using the M probe (6.9 kPa vs. 8.4 kPa, p < 0.001). Details are shown in Table 3. A Bland-Altman regression showed that the mean difference was 1.68 kPa (95% CI: 1.06–2.31 kPa) and did not vary for the different fibrosis stages (Fig. 2).

Table 3 Comparison of the performance of the M probe with the XL probe
Fig. 2
figure 2

Box plots of transient elastography using the M probe (dark grey) and the XL probe (light grey) for each fibrosis stage. The top and bottom of the boxes represent the first and third quartiles, respectively. The length of the box represents the interquartile range within which 50% of values are located. The line through the middle of each box represents the median. The error bars mark the minimum and maximum values (range)

Measurement in patients with BMI ≥ 30 kg/m2

Seventeen of the 50 included patients (34%) had a BMI ≥ 30 kg/m2. From these 17 patients, 10 valid TE measurements were obtainable with the M probe in only 11/17 (65%) of patients, while they were obtainable with the XL probe in 16/17 (94%) of patients. The only patient without ten valid measurements with the XL probe had a BMI of 42 kg/m2 and a distance between the skin and the liver capsule of 45 mm. Successful measurement with the XL probe in patients without ten valid measurements with the M probe was possible in 5/6 (83%) of the patients. In 7/17 (41%) patients measured with the M probe and in 2/17 (12%) patients measured with the XL probe, the success rate was < 60% (p < 0.05). In 7/17 (41%) patients measured with the M probe and in 2/17 (12%) patients measured with the XL probe, the ratio of IQR/median was >30% (p < 0.05).

Discussion

To our knowledge, this pilot study is the first to evaluate a new transient elastography probe for obese patients (XL probe) using liver histoloy as a reference method. The results of the study show that reliable measurement of liver fibrosis is possible using the XL probe with diagnostic accuracy of 82% for the diagnosis of significant fibrosis and 91% for the diagnosis of cirrhosis. The advantage of the XL probe is that significantly more obese patients can be examined with this method. Only one previous study evaluated the feasibility of the XL probe in 100 obese patients with BMI ≥ 30 kg/m2; however, no histology was available for these patients, and the accuracy of the new probe had been unclear up to now. Nevertheless, the study reported that 60% more patients were measurable with the XL probe than with the M probe [21]. In the present pilot study, 83% of patients with measurement failure with the M probe could be examined without failure with the XL probe. In the largest study on transient elastography in NAFLD, successful measurement could not be obtained in 25% of obese patients with a BMI ≥ 30 kg/m2 [20]. These are the patients who will profit most from the new XL probe. While the measurement depth of the standard probe (M probe) begins 25 mm below the skin, the measurement depth of the XL probe begins 35 mm below the skin. Therefore, it seems to be logical that the XL probe is especially useful in patients with a distance between the skin and liver capsule between 25 and 35 mm. This assumption is supported by the results of the present study with no significant difference in the number of TE failures between the two probes in patients with a distance between the skin and liver capsule of ≤25 mm and a significant difference in patients with a distance between the skin and liver capsule of >25 mm. In addition, the only measurement failure of the XL probe was in a patient with a distance between the skin and liver capsule of 45 mm.

The median liver stiffness measured with the XL probe was significantly lower than that measured with the M probe (6.9 vs. 8.4 kPa). The reason might be that non-hepatic tissue is involved in the measurement with the M probe in patients with a skin-to-capsule distance greater than 25 mm. Therefore, the existing cutoffs defined by using the M probe cannot be used for the interpretation of liver stiffness measurements using the XL probe. The present pilot study is too small to define new cutoffs for the XL probe. Larger studies are awaited. However, in the meantime until new cutoffs are available for the XL probe, the measurement results using the commercially available XL probe need to take into account that the mean difference between the two probes is around 1–2 kPa.

Two meta-analyses evaluating transient elastography for the staging of liver fibrosis have reported mean diagnostic accuracies of 84–87% for the diagnosis of significant fibrosis and 94–96% for the diagnosis of liver cirrhosis [15, 16]. Comparable results were shown in the three studies evaluating transient elastography in adults and children with NAFLD [1820]. The largest study by Wong et al. [20] including 246 patients with NAFLD reported a diagnostic accuracy of 84% for the diagnosis of significant fibrosis and 97% for the diagnosis of liver cirrhosis, respectively. The results of the present study are in accordance with this large study. In addition, Wong et al. [20] demonstrated that steatosis and BMI do not increase liver stiffness measurement of TE. Also, in the present study, no such influence was observed for either probe.

In addition to the staging of liver fibrosis, the group of Kleiner et al. [24] proposed a histological NAFLD Activity Score (NAS), which specifically includes only features of active injury that are potentially reversible in the short term. The aim of NAS was to assess overall histological changes, which can also be used to evaluate histological changes after therapeutic intervention trials. The score is defined as the unweighted sum of the scores for steatosis, lobular inflammation and ballooning ranging from 0 to 8. Patients with NAS ≥ 5 were diagnosed as having steatohepatitis in most cases [24]. The AUROC of transient elastography for the diagnosis of steatohepatitis according to NAS was 0.79 for the M probe and 0.74 for the XL probe. However, a significant correlation was also found for histological fibrosis and NAS. Larger studies are certainly necessary to further evaluate whether disease regression and possibly progression according to NAS can be recognised and measured with transient elastography.

Our study has several limitations. Liver biopsy was used as the reference method, and the accuracy of liver biopsy is limited because of significant intra- and interobserver variability and sampling errors [1013]. A recent study has demonstrated that error in liver biopsy results makes it impossible to distinguish a perfect non-invasive marker from less valid assays [27]. This supports the assumption that non-invasive markers might be underestimated using liver biopsy as a reference method. The ultimate validation of liver fibrosis as a marker of liver injury is its prognostic value in terms of morbidity and mortality; long-term studies evaluating these endpoints are awaited. Another limitation of our study is that liver biopsies were performed up to 18 months before TE measurement. However, as the mean progression rate of liver fibrosis in untreated patients was estimated to be 0.085–0.120 fibrosis stages per year [23], the changes over 18 months were expected to be minimal only. In addition, biopsies that are shorter than the usual standard of 15 mm were included in the present study if at least six portal tracts were present. Nevertheless, this was a comparative study between the different TE probes, where the limitations of liver biopsy affected both methods equally.

In summary, transient elastography using the XL probe for obese patients can be performed with comparable diagnostic accuracy to that of the standard probe (M probe) with the advantage of enabling the examination of significantly more obese patients. Large prospective multicentre studies are necessary to develop new cutoffs for the staging of liver fibrosis using the XL probe.