Introduction

Ultrasound (US) elastography is a non-invasive technique used to measure liver stiffness, for grading liver fibrosis or predicting portal hypertension, in patients with chronic liver disease. Shear wave elastography (SWE) is the US elastography technique used most widely for measuring liver stiffness. SWE may be performed with any of three techniques: transient elastography (TE), point shear wave elastography (pSWE), and two-dimensional shear wave elastography (2D-SWE). TE (FibroScan; Echosens) was the first technique developed and is used most widely. For TE, a 50-Hz mechanical impulse is delivered to the skin to generate a shear wave. The velocity of the shear wave generated by liver tissue is measured [1, 2]. pSWE and 2D-SWE are comparatively new techniques that use an acoustic radiation force impulse (ARFI) of 100–500 Hz to cause liver tissue deformation and generate a shear wave [3]. The operator freely chooses where to place the region-of-interest (ROI) under the guidance of gray-scale US images using conventional US probes. This approach cannot be used when performing TE. In pSWE, average shear wave speed within an ROI of fixed size is determined. Conversely, 2D-SWE allows the operator to modify the size of the ROI and to obtain color elasticity maps [3].

Although US elastography has been described by many studies as an effective diagnostic method for use in the adult population [1, 4,5,6], fewer studies have investigated use in pediatric and adolescent populations [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. Furthermore, beyond diagnostic accuracy, the ability to obtain successful and reliable measurements is also important. Notably, the evaluation of children carries some disadvantages, such as small body size or the potential inability of the patient to hold his or her breath. Currently, a large population study [43] reported on the rates of technical failure and unreliable measurement when US elastography was used to assess an adult population. However, the issue has been investigated in pediatric and adolescent populations only in small-scale studies [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42].

In this study, we systematically reviewed and conducted a meta-analysis to evaluate the technical performance of US elastography in pediatric and adolescent patients.

Materials and methods

This systematic review and meta-analysis was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [44].

Literature search

MEDLINE and EMBASE databases (up to July 12, 2017) were searched to find studies that were relevant to our research. The search terms used were as follows: ((children) OR (pediatric*) OR (paediatric*) OR (adolescent*)) AND ((liver) OR (hepatic)) AND ((elastography) OR (“transient elastography”) OR (TE) OR (fibroscan) OR (acoustic radiation force impulse imaging) OR (ARFI) OR (Virtual Touch tissue quantification) OR (VTQ) OR (Virtual Touch tissue imaging quantification) OR (VTIQ) OR (shear wave elastography) OR (shearwave elastography) OR (shear-wave elastography) OR (SWE) OR (Supersonic) OR (Aixplorer) OR (shear wave speed imaging)). Only English articles were evaluated. The bibliographies of all selected articles were screened to identify additional relevant publications.

Two reviewers independently performed literature search, study selection, and data extraction (D.W.K. and H.M.Y., with 2 and 5 years of experience in systematic reviews and meta-analyses, respectively). A third reviewer resolved all disagreements (Y.A.C., with 24 years of experience in pediatric radiology).

Exclusion criteria were as follows: (a) case reports or series including < 10 patients; (b) reviews, editorials, letters, comments, or conference abstracts; (c) studies using US elastography modalities other than shear wave techniques (e.g., strain elastography); and (d) partially overlapping study populations.

Original articles investigating shear wave elastography (TE, pSWE, or 2D-SWE) to measure liver stiffness in pediatric and adolescent patients (more than 95% of the population being under 20 years old) were selected for analysis. Retrieved studies were initially screened through their titles and abstracts for potential eligibility, and subsequently, the full texts were reviewed for final inclusion.

Data extraction

Data were extracted using a standardized form related to (a) study characteristics: authors, year of publication, institution, country of origin, duration of patient recruitment, and study design (prospective vs. retrospective); (b) demographic and clinical characteristics: patient number, male/female ratio, age (mean age and range), and etiology; (c) technical characteristics: type of shear wave elastography (TE, pSWE, or 2D-SWE), model, probe, number of measurements, representative values, number of readers, and presence of reader blinding; (d) outcomes: rate of technical failure and/or unreliable measurement for each type of SWE.

Technical failure was defined as no or little value obtained for all acquisitions. However, based on the definition of the unreliable measurement for SWE, a consensus across the different techniques is unavailable. According to the manufacturer’s recommendations [45], TE measurements are unreliable when they do not meet any of the following criteria: (a) < 10 valid measurements; (b) success rate (valid shots/total number of shots) < 60%; (c) interquartile range (IQR) ≥ 30% of median liver stiffness value. However, a clear guideline for the unreliable measurement of pSWE and 2D-SWE is unavailable. Therefore, for all articles, we applied the original definition used in the individual study.

Data synthesis and analysis

The primary outcome of our systematic review and meta-analysis was the pooled proportion of technical failures and/or unreliable measurements of SWE. Meta-analytic pooling was performed using the inverse variance method to calculate weights [46,47,48]. Overall proportion was used to obtain random effects meta-analysis of single proportion. Logit transformation of proportion was performed. A confidence interval (CI) was obtained by Clopper–Pearson interval for individual studies, and a continuity correction of 0.5 in studies with zero cell frequencies was performed. Heterogeneity was evaluated using (1) Cochran’s Q test for the summary estimates (p < 0.05 indicating heterogeneity) and (2) the Higgins inconsistency index (I2) (≥ 50% indicating significant heterogeneity) [49, 50]. Funnel plots were used to visually assess publication bias, and Egger’s test was used to determine statistical significance (p < 0.10 indicating significant bias) [51]. Meta-regression analysis was used for the unreliable measurement of TE, to explore potential heterogeneity causes. To this end, the following covariates were used: (a) the number of study population (< 100 vs. ≥ 100); (b) etiology (known chronic liver disease vs. others); (c) mean or median age (< 9 vs. ≥ 9 years old); (d) transducer (including or not a pediatric S probe); (e) presence of number of readers; and (f) readers’ blinding to pathologic results.

One reviewer (D.W.K) performed all statistical analyses using “Meta” package R version 3.4.1 (R Foundation for Statistical Computing).

Results

Literature search and selection

Figure 1 presents a flowchart for our process of literature selection. A total of 1184 studies were obtained from Ovid-MEDLINE (n = 405) and EMBASE (n = 779), and 414 duplicates were excluded. Following titles and abstracts screening, further 693 studies were removed as follows: 48 review articles; 13 case reports/series; 262 conference abstracts; 14 studies were either a letter, an editorial, a comment, or a note; 339 studies were not in the field of interest; and 17 studies targeted adult populations. After a full-text review of the remaining 77 articles, 8 were excluded because of three case reports/series; two studies used other US elastography techniques (strain elastography); and three studies targeted adult populations. Ultimately, 69 studies satisfied our criteria.

Fig. 1
figure 1

Flow diagram for study selection

Of the 69 studies, 29 (42.0%) lacked technical performance information. Furthermore, we were not able to obtain the rate of technical failure or unreliable measurements as per-patient level in four studies because they did not report the rate of successfully measured patients but only reported the rate of successful measurements among all measurements (per-measurement level; seeSupplementary References). Therefore, our study included 36 articles [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42] for quantitative synthesis.

Characteristics of included studies

Table 1 summarizes the demographic characteristics for the included study populations. Twenty-three [7,8,9,10, 13,14,15, 17,18,19,20,21,22, 26,27,28,29, 33, 35, 37, 39, 40, 42] were prospectively designed, and 3 [31, 38, 41] were retrospectively designed. The age of patients included in the studies ranged from newborn to 23 years. Table 2 summarizes the technical characteristics of the SWE in the included studies. Considering the techniques of SWE, 22 [8, 10,11,12,13, 17, 18, 23,24,25, 27, 28, 30,31,32,33,34,35,36, 38, 40], 10 [9, 10, 15, 16, 19, 26, 28, 29, 35, 39], and 9 studies [7, 10, 14, 15, 20, 22, 37, 41, 42] were performed using TE, pSWE, and 2D-SWE, respectively. One study included the three techniques [10], two studies included TE and pSWE [28, 35], and one study included pSWE and 2D-SWE [15]. Regarding the type of transducers for TE, 12 studies [8, 11, 17, 23,24,25, 28, 30, 35, 36, 38, 40] used a standard M and a pediatric S probe; 7 studies [10, 12, 13, 18, 31,32,33] did not use the pediatric S probe.

Table 1 Demographic characteristics for included study populations
Table 2 Technical characteristics of shear wave elastography used in included studies

Systematic review of technical performance: quality of reporting

Regarding the TE studies, technical failure was reported in 3 of 22 studies (14%) [24, 33, 34] and unreliable measurement was reported in 21 of 22 studies (95%) [8, 10,11,12,13, 17, 18, 21, 23,24,25, 27, 28, 30,31,32,33, 35, 36, 38, 40]. Rates of technical failure ranged from 1.0 to 9.5%, and rate of unreliable measurements ranged from 0 to 28.9% (Supplementary Table 1). In the pSWE studies, technical failure was reported in 6 of 10 studies (60%) [15, 16, 19, 26, 29, 39] and unreliable measurement was reported in 4 of 10 studies (40%) [9, 10, 28, 35]. Rates of technical failure ranged from 0 to 21.0%. One of four studies [28] reported a rate of unreliable measurements of 5.4% (Supplementary Table 2); the other three studies [9, 10, 35] reported 0% as the rate of unreliable measurements. 2D-SWE rates of technical failure were reported in eight of nine studies (89%) [7, 10, 14, 15, 20, 22, 37, 42] and ranged from 0 to 22.6%. Indeed, no technical failure was found in five [7, 10, 14, 20, 37] of those eight studies. The unreliable measurement of 2D-SWE was reported in one out of nine studies [41] with rate of unreliable measurements of 3.1% (Supplementary Table 3). Table 3 summarizes each study’s criterion for the reliable measurement. In terms of the unreliable measurement of TE, 16 [8, 10, 12, 17, 18, 23,24,25, 27, 28, 31,32,33, 35, 36, 40] of the 21 studies followed the reliability criteria recommended by the manufacturer.

Table 3 Criteria for reliable measurement in included studies

Meta-analysis

Meta-analysis was performed for the parameters consisting of more than five studies, including the unreliable measurements of TE, technical failure during pSWE, and technical failure during 2D-SWE.

Meta-analysis was performed for the unreliable measurement of TE from 16 studies [8, 10, 12, 17, 18, 23,24,25, 27, 28, 31,32,33, 35, 36, 40] with 3030 patients, which met the manufacturer’s recommended criteria. A total of 397 unreliable measurements occurred. Using a random effects model, the observed pooled proportion was 12.1% (95% CI, 9.4–15.5%) (Fig. 2). Significant heterogeneity was found in Cochran’s Q test (p < 0.01) and Higgin’s I2 statistic (I2 = 79%). Significant publication bias was observed by funnel and Egger’s tests (p = 0.06) (Supplementary Figure 1).

Fig. 2
figure 2

Forest plots for pooled proportions of unreliable TE measurements

Use of a random effects model (Fig. 3) to analyze in 59 events from 1364 patients showed that the pooled proportion of technical failure during pSWE was 4.1% (95% CI, 1.5–10.7%). Furthermore, significant heterogeneity was observed (Cochran’s Q test, p < 0.01; Higgin’s I2 statistic, I2 = 85%).

Fig. 3
figure 3

Forest plots for pooled proportions of technical failure during pSWE

A pooled proportion of technical failure during 2D-SWE was 2.2% (95% CI, 0.6–7.6%) using a random effects model (Fig. 4) with significant heterogeneity (Cochran’s Q test, p < 0.01; Higgin’s I2 statistic, I2 = 83%).

Fig. 4
figure 4

Forest plots for pooled proportions of technical failure during 2D-SWE

Comparison of technical failure rates between pSWE and 2D-SWE revealed a lower pooled proportion of technical failure during 2D-SWE than pSWE, albeit statistically not significant (p = 0.61).

Meta-regression analysis

Study heterogeneity was influenced by study population’s number (p = 0.01) and readers’ blinding to pathologic results (p = 0.02) in the meta-regression analysis for the unreliable measurement of TE. Other covariates including etiology (p = 0.92), age (p = 0.63), transducer (p = 0.06), and presence of number of readers (p = 0.19) did not influence the rate of unreliable measurements (Supplementary Table 4).

Discussion

According to our systematic review, 40 out of 69 studies investigating SWE in pediatric and adolescent patients provided technical performance information, especially the rate of unreliable measurements from TE studies and rates of technical failure from pSWE and 2D-SWE studies. In brief, the pooled proportion of unreliable measurements of TE was 12.1%, and the pooled proportion of technical failure during 2D-SWE tended to be lower than the pooled proportion of technical failure during pSWE (2.2% vs. 4%, p = 0.61).

The US Food and Drug Administration approved US elastography as a commercially available diagnostic device. However, US elastography remains to be validated by the clinical community as a quantitative biomarker for measurement of liver fibrosis in clinical practice. As to the clinical validation, technical success assessment and variability measurement are as important as the diagnostic performance assessment. However, we observed that 29 of 69 studies (42.0%) overlooked the technical performance, a considerable number, given its importance.

The contrast in technical performance reporting among techniques is likely linked to the standardization of reliability criteria. For TE, a manufacturer recommends criteria for reliable measurements, which are widely acceptable across the studies. Specifically, 10 successful acquisitions with a success rate ≥ 60% and an IQR/median value < 30% are considered reliable. Unlike TE, standard reliability criteria are not available for pSWE and 2D-SWE. Therefore, only few pSWE and 2D-SWE studies reported the unreliable measurement with variable reliability criteria. Rather, they provided technical failure assessment. To this end, collaborative efforts between academia and industry to reach a consensus and standardize SWE measurements are currently underway. The Society of Radiologists in Ultrasound recently published a consensus statement on the technical aspects of US elastography, suggesting 10 measurements covering the same hepatic location with an IQR of 30% or less of the median value [4]. The World Federation for Ultrasound in Medicine and Biology recommends 5 to 10 measurements for pSWE and 4 measurements for 2D-SWE [1]. Recently, the 2017 European Federation of Societies for Ultrasound in Medicine and Biology guidelines recommend that 3 measurements for 2D-SWE are sufficient to obtain consistent results [52]. Of note, numerous measurements can lead to extended examination times, making young patients less tolerable [37]. Thus, conclusions that are valid for adult populations may not necessarily be extrapolated to pediatric and adolescent populations. In this regard, Shin et al [37] evaluated the optimal 2D-SWE acquisition number. They observed that three measurements are sufficient to measure liver stiffness in children over 6 years old. However, future studies are necessary to define optimal reliability criteria for pSWE and 2D-SWE and to determine the US elastography technique of choice in pediatric and adolescent population.

Varying technical principles among SWE techniques likely influence technical performance differences. We therefore assumed that pSWE and 2D-SWE would have better technical performance than TE given its advantage in technical principles [3]. First, as opposed to TE, the ROI can be guided by conventional B-mode US while avoiding structures such as vessels, gallbladder, or focal lesions. Second, pSWE and 2D-SWE generate shear wave in the tissue of interest by ARFI, whereas TE applies a mechanical impulse to the skin surface to generate shear wave and the propagation of this shear wave is tracked. Thus, structures interposed between the skin surface and liver capsule (e.g., perihepatic ascites or fat) have a smaller impact on measuring liver stiffness in pSWE & 2D-SWE than TE. Last, color elasticity maps of 2D-SWE help avoid artifacts induced by the stress which tends to concentrate on the region near the boundary [42, 53].

Similarly to adults, pediatric and adolescent high BMI or obesity is associated with increased technical failure and unreliable measurement, due to excessive thickness of the subcutaneous adipose tissue [12, 15]. Moreover, in children, additional factors can affect the likelihood of technical failure and/or unreliable measurements. First, children with narrow intercostal space might have higher failure rate when measuring liver stiffness with a standard adult probe. For small children, the use of a pediatric S probe (frequency, 5 MHz; probe diameter, 5 mm) in TE can overcome the problem of a standard M probe (frequency, 3.5 MHz; probe diameter, 7 mm), especially for those children with a thorax perimeter smaller than 75 cm [10, 17, 24, 38]. Specifically, two different modes of S probe, S1 and S2, are available, with a measurement depth of 1.5–4 cm in S1 mode and 2–5 cm in S2 mode [36]. Second, children are less cooperative during an US elastography. Continuous movement or crying of patients causes a less accurate measurement [12, 14]. Therefore, young children (< 24 months) who are vulnerable to agitation have lower success rates [17]. In this case, general anesthesia would reduce excessive movement and increase success rates, albeit its invasiveness. Last, breath holding can facilitate the obtainment of valid measurements. However, in young children, such practice is difficult if not impossible. So far, mixed reports are available on whether breath holding affects successful measurements in pediatric patients. In fact, some studies showed that variability caused by breathing was not significant [19, 29, 37]. They rather worried that breath holding in children would cause an irregular and variable breathing rhythm, determining invalid measurements [19]. However, a recent study showed decreased liver stiffness with the free-breathing technique compared with the breath-holding technique in 2D-SWE [54]. Therefore, we performed the meta-regression over the factor of the age and the S probe usage. However, we observed that they did not affect study heterogeneity. Rather, studies including the S probe showed less reliable measurements than those not including it, without statistical significance (14.5% vs. 7.6%, p = 0.06). These conflicting results might depend on the heterogeneity of the population included in our study. Thus, future large-scale study allows the evaluation of factors influencing successful measurements in pediatric patients.

One limitation of our study is the small number of studies included. Therefore, for some SWE techniques, we were unable to perform a meta-analysis. Additionally, we were unable to compare rates of technical failure or unreliable measurement among techniques. Importantly though, we included all the available studies and overviewed the current evidence on aspects of the technical performance in pediatric population. An additional limitation is the significant heterogeneity of the available meta-analysis. Although the number of study population and reader blinding to the reference standard affected heterogeneity, other patient or technical factors may influence it. For example, the placement of ROI of pSWE and 2D-SWE may be user dependent that sufficient experience is required for obtaining consistent measurements [55]. Thus, a large-cohort study is warranted to evaluate factors that may affect technical performance.

In summary, TE studies seldom reported rates of technical failure, but rather reported rate of unreliable measurements with the pooled proportion of 12.1%. Conversely, pSWE and 2D-SWE studies often reported rates of technical failure showing comparable results, but rarely reported measurements of unreliability. Considering the importance of technical performance for clinical validation of SWE, the number of and reasons for technical failure and unreliable measurements should be reported in future studies. Additionally, further efforts are necessary to standardize the reliability criteria for SWE.