Introduction

Biliary atresia (BA) is a disease of unknown etiology that affects both the extrahepatic and the intrahepatic bile ducts, leading to progressive obliteration of the biliary tree [1], causing severe cholestasis and biliary cirrhosis, that leads finally to death in the first years of life. The recommended treatment of BA is sequential: in the first and second month of life, the Kasai portoenterostomy, or its technical variants, aims to restore the biliary flow to the intestine; in the case of failure of the operation and/or life-threatening complications of the biliary cirrhosis, liver transplantation (LT) may eventually be needed [2]. Current general conclusion is that the earlier the Kasai portoenterostomy performed, the better the effect. So early diagnosis of BA is very important for the BA infants’ long-term free-transplant survival. The objective of our study is to analyze the accuracy of different diagnosis methods for diagnosing BA.

Methods

Literature search

We searched PubMed, EMBASE and the Web of Science databases for articles published up to July 2017, with searching ((((((diagnosis[Title/Abstract]) OR diagnose[Title/Abstract]) OR diagnostic[Title/Abstract]) OR screening[Title/Abstract])) AND (((((((((((((((((((((Ultrasonograph[Title/Abstract]) OR Echography[Title/Abstract]) OR Ultrasound Imaging[Title/Abstract]) OR ultrasound[Title/Abstract]) OR Imaging, Ultrasound[Title/Abstract]) OR Ultrasound Imagings[Title/Abstract]) OR Sonography, Medical[Title/Abstract]) OR Medical Sonography[Title/Abstract]) OR Diagnostic Ultrasound[Title/Abstract]) OR Ultrasound, Diagnostic[Title/Abstract]) OR Echotomography[Title/Abstract]) OR Diagnosis, Ultrasonic[Title/Abstract]) OR Diagnosis, Ultrasonic[Title/Abstract]) OR Ultrasonic Tomography[Title/Abstract]) OR “Ultrasonography”[Mesh])) OR ((((((((Cholangiopancreatography, Magnetic Resonance[Title/Abstract]) OR Magnetic Resonance Cholangiopancreatography[Title/Abstract]) OR MRCP[Title/Abstract]) OR MR Cholangiopancreatography[Title/Abstract]) OR Magnetic Resonance Cholangiography[Title/Abstract]) OR MR Cholangiography[Title/Abstract])) OR “Cholangiopancreatography, Magnetic Resonance”[Mesh])) OR (((((acholic stool[Title/Abstract]) OR pale stool[Title/Abstract]) OR clay stool[Title/Abstract]) OR stool color card[Title/Abstract]) OR stool colour card[Title/Abstract])) OR (((((((Liver Function Tests[Title/Abstract]) OR Function Test, Liver[Title/Abstract]) OR Function Tests, Liver[Title/Abstract]) OR Liver Function Test[Title/Abstract]) OR Test, Liver Function[Title/Abstract]) OR Tests, Liver Function[Title/Abstract]) OR “Liver Function Tests”[Mesh])) OR ((((Hepatobiliary scintigraphy[Title/Abstract]) OR Technetium Tc 99 m Lidofenin[Title/Abstract]) OR HIDA[Title/Abstract]) OR hepatobiliary scintiscanning[Title/Abstract])) OR (((((liver[Title/Abstract]) OR hepatic[Title/Abstract]) OR hepatology[Title/Abstract])) AND ((((biopsy[Title/Abstract]) OR pathology[Title/Abstract]) OR pathological[Title/Abstract]) OR histopathology[Title/Abstract])))) AND (((((((((((Biliary Atresia[Title/Abstract]) OR Biliary Atresia, Extrahepatic[Title/Abstract]) OR Atresia, Extrahepatic Biliary[Title/Abstract]) OR Atresias, Extrahepatic Biliary[Title/Abstract]) OR Biliary Atresias, Extrahepatic[Title/Abstract]) OR Extrahepatic Biliary Atresia[Title/Abstract]) OR Extrahepatic Biliary Atresias[Title/Abstract]) OR Atresia, Biliary[Title/Abstract]) OR Familial Extrahepatic Biliary Atresia[Title/Abstract]) OR Idiopathic Extrahepatic Biliary Atresia[Title/Abstract]) OR “Biliary Atresia”[Mesh]).

Inclusion criteria

The inclusion criteria for the identified articles were as follows: (1) diagnostic test accuracy (DTA) studies evaluating sensitivity and specificity of at least one of B-US, MRCP, acholic stool, serum liver function test, hepatobiliary scintigraphy and percutaneous liver biopsy, (2) articles were published in full texts in English and (3) studies with sufficient information for analysis.

Exclusion criteria

The exclusion criteria for the identified articles were as follows: (1) letters, reviews, case reports, conference abstracts, editorials, expert opinion reviews and abstracts, (2) data of sensitivity, specificity is incorrect or insufficient for analysis or evaluated by more than one researcher without a consensus, (3) screening studies with a large population without cholestasis and (4) studies with overlapping cases and data. If the cases of two or more studies overlap each other, give priority to the study with more diagnosis methods evaluated and whose cases are more if diagnosis methods are the same.

Screening

Screening was performed in duplicates, independently, by two researchers at all stages. Disagreements in study selection between the two reviewers were resolved through consensus.

Data extraction

Data were extracted on study characteristics (e.g. study period, design, sample size, and location of the study), study sample characteristics (e.g. age at diagnosis), and diagnostic data (e.g. true positives, true negatives, false positives, false negatives, sensitivity, specificity). Extract the data of the commonest criteria if a study evaluates two or more criteria of a diagnosis method.

Quality assessment

Using the version 2 of the Quality Assessment of Diagnostic Test Accuracy Studies (QUADAS-2) tool [3], quality of studies included in our study was assessed by two researchers. All disagreements were discussed and consensus was reached.

Data analysis

Heterogeneity was assessed using the I2 statistic index, with a value > 50% considered to represent substantial heterogeneity. When a great heterogeneity was noted, heterogeneity by a “threshold effect” was analyzed using Spearman correlation coefficients (p < 0.05 represents threshold effect). We used a random effects model for the primary meta-analysis to obtain a summary estimate for sensitivity, specificity, positive likelihood ratio (LR +), negative likelihood ratio (LR −), diagnostic odds ratio (DOR) with 95% CIs, positive predictive value (PPV) and negative predictive value (NPV) of each diagnosis method. If there is not substantial heterogeneity among studies, pool data by fixed effects model are done. Then, we constructed a summary receiver operator characteristic curve (SROC) and calculated the area under curve (AUC).

Subgroup analyses are performed by following covariates: (1) study design (prospective versus retrospective), (2) cases (≤ 60 versus > 60) and (3) final diagnosis method (intraoperative cholangiography with/without surgery or histology versus surgery and/or histology). In addition, publication bias is assessed by a Deeks funnel plot (p < 0.05 was considered representative of significant statistical publication bias). We used the Meta-DiSc 14.0 and Stata 14 to perform the statistical analyses.

Results

Study selection

Initial search of PubMed, EMBASE and the Web of Science databases yielded 1489 studies. Figure 1 shows the flow diagram of the study selection. Of the 80 full-text articles assessed for final eligibility, 42 are excluded (4 without full text, 3 non-English, 6 without sufficient data, 1 evaluated by two or more researchers without a consensus, 10 incorrect data, 13 with overlapping cases, 5 screening study).

Fig. 1
figure 1

Flow diagram of the study selection process

Study characteristics

A total of 3053 patients were included within in the 38 studies [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] (Table 1) included for analysis, 25 studies were prospective, 10 were retrospective and 3 could not be clearly identified. Studies were published between 1985 and 2016. Studies most commonly originated from the China (8/38 studies), followed by Korea (7/38 studies) and USA (6/38 studies). The overall quality of the included studies assessed by the QUADAS-2 (Table 2), was moderate, and all of the studies was low risk of bias on 5 or more of the 7 items.

Table 1 The characteristics of the studies included in this study
Table 2 Risk of Bias assessed by QUADAS-2

There were 21 articles that final diagnosis methods of BA explicitly included intraoperative cholangiography (IC). Of the 21 articles, 6 were diagnosed only by IC, 6 are diagnosed by IC and surgery, 8 were diagnosed by IC and histology, 1 was diagnosed by IC, surgery and histology. Besides, 1 article that final diagnosis method did not include IC, 8 articles that are final diagnosed by surgery with/without histology and 8 articles did not mention how to final diagnose BA. Of the 38 articles, 25 articles performed the diagnostic test when the reference test results were unknown, 10 articles knew the reference test results in advance and 3 articles did not mention.

Diagnostic values

B-US

Data on the diagnostic performance of the B-US were collected from 23 studies with 1774 patients (Table 3). The Spearman correlation coefficient was 0.033, p value was 0.883, indicating no threshold effect. The diagnostic odds ratio was 46.02 (95% CI 22.71–93.27), I2 was 71.4%, showing high heterogeneity among the studies.

Table 3 Diagnostic profile of various diagnostic methods

The forest plot of the sensitivity and specificity of the diagnostic performance of B-US is shown in Figs. 2, 3. The sensitivities and specificities of individual studies varied from 31 to 99% and from 71 to 100%, respectively. The B-US showed pooled sensitivity of 77% (95% CI 74–80%), specificity of 93% (95% CI 91–94%), LR + of 8.48 (95% CI 5.52–13.02) and LR − of 0.28 (95% CI 0.20–0.39). The summary ROC curves of B-US for the diagnosis of biliary atresia are illustrated in Fig. 4. The summary ROC curve was symmetric, and the AUC was 0.9396, Q was 0.8770. The PPV is 88.6% and the NPV is 85.3%.

Fig. 2
figure 2

The forest plots of pooled sensitivity for B-US

Fig. 3
figure 3

The forest plots of pooled specificity for B-US

Fig. 4
figure 4

Summary receiver operating characteristics (SROC) curve of the B-US

MRCP

Data on the diagnostic performance of the MRCP were collected from five studies with 381 patients (Table 3). The Spearman correlation coefficient was 0.000, p value was 1.000, indicating no threshold effect. The diagnostic odds ratio was 43.49 (95% CI 8.53–221.83), I2 was 64.3%, showing high heterogeneity among the studies.

The forest plot of the sensitivity and specificity of the diagnostic performance of MRCP is shown in Figs. 5, 6. The sensitivities and specificities of individual studies varied from 85 to 100% and from 36 to 96%, respectively. The MRCP showed summary sensitivity of 96% (95% CI 92–98%), specificity of 58% (95% CI 51–65%), LR + of 2.96 (95% CI 1.58–5.55) and LR − of 0.08 (95% CI 0.02–0.30). The summary ROC curves of MRCP for the diagnosis of biliary atresia are illustrated in Fig. 7. The summary ROC curve was symmetric, and the AUC was 0.9409, Q was 0.8788. The PPV is 68.0% and the NPV is 94.3%.

Fig. 5
figure 5

The forest plots of pooled sensitivity for MRCP

Fig. 6
figure 6

The forest plots of pooled specificity for MRCP

Fig. 7
figure 7

Summary receiver operating characteristics (SROC) curve of the MRCP

Acholic stool

Data on the diagnostic performance of the acholic stool were collected from seven studies with 610 patients (Table 3). The Spearman correlation coefficient was 0.071, p value was 0.879, indicating no threshold effect. The diagnostic odds ratio was 30.66 (95% CI 17.48–53.76), I2 was 0.0%, showing low heterogeneity among the studies.

The forest plot of the sensitivity and specificity of the diagnostic performance of acholic stool is shown in Figs. 8, 9. The sensitivities and specificities of individual studies varied from 58 to 100% and from 56 to 100%, respectively. The acholic stool showed pooled sensitivity of 87% (95% CI 82–91%), specificity of 78% (95% CI 74–82%), LR + of 3.87 (95% CI 3.17–4.72) and LR − of 0.17 (95% CI 0.12–0.23). The summary ROC curves of acholic stool for the diagnosis of biliary atresia are illustrated in Fig. 10. The summary ROC curve was symmetric, and the AUC was 0.9238, Q was 0.8578. The PPV is 70.4% and the NPV is 91.2%.

Fig. 8
figure 8

The forest plots of pooled sensitivity for acholic stool

Fig. 9
figure 9

The forest plots of pooled specificity for acholic stool

Fig. 10
figure 10

Summary receiver operating characteristics (SROC) curve of the acholic stool

Serum liver function test

Data on the diagnostic performance of the serum liver function test were collected from seven studies with 494 patients (Table 3). The Spearman correlation coefficient was 0.036, p value was 0.939, indicating no threshold effect. The diagnostic odds ratio was 19.00 (95% CI 4.99–72.30), I2 was 82.4%, showing high heterogeneity among the studies.

The forest plot of the sensitivity and specificity of the diagnostic performance of serum liver function test is shown in Figs. 11, 12. The sensitivities and specificities of individual studies varied from 66 to 100% and from 32 to 98%, respectively. The serum liver function test showed pooled sensitivity of 84% (95% CI 78–89%), specificity of 97% (95% CI 97–98%), LR + of 4.73 (95% CI 0.66–34.02) and LR − of 0.26 (95% CI 0.14–0.51). The summary ROC curves of serum liver function test for the diagnosis of biliary atresia are illustrated in Fig. 13. The summary ROC curve was symmetric, and the AUC was 0.9080, Q was 0.8399. The PPV is 62.5% and the NPV is 79.3%.

Fig. 11
figure 11

The forest plots of pooled sensitivity for serum liver function test

Fig. 12
figure 12

The forest plots of pooled specificity for serum liver function test

Fig. 13
figure 13

Summary receiver operating characteristics (SROC) curve of the serum liver function test

Hepatobiliary scintigraphy

Data on the diagnostic performance of the hepatobiliary scintigraphy were collected from 18 studies with 1423 patients (Table 3). The Spearman correlation coefficient was − 0.613, p value was 0.007, indicating threshold effect. The diagnostic odds ratio was 43.11 (95% CI 19.98–93.00), I2 was 53.4%, showing high heterogeneity among the studies.

The forest plot of the sensitivity and specificity of the diagnostic performance of hepatobiliary scintigraphy is shown in Figs. 14, 15. The sensitivities and specificities of individual studies varied from 84 to 100% and from 35 to 93%, respectively. The hepatobiliary scintigraphy showed pooled sensitivity of 96% (95% CI 94–97%), specificity of 73% (95% CI 70–76%), LR + of 3.26 (95% CI 2.38–4.48) and LR − of 0.09 (95% CI 0.05–0.16). The summary ROC curves of hepatobiliary scintigraphy for the diagnosis of biliary atresia are illustrated in Fig. 16. The summary ROC curve was symmetric, and the AUC was 0.9300, Q was 0.8651. The PPV is 64.5% and the NPV is 97.2%.

Fig. 14
figure 14

The forest plots of pooled sensitivity for hepatobiliary scintigraphy

Fig. 15
figure 15

The forest plots of pooled specificity for hepatobiliary scintigraphy

Fig. 16
figure 16

Summary receiver operating characteristics (SROC) curve of the hepatobiliary scintigraphy

Percutaneous liver biopsy

Data on the diagnostic performance of the percutaneous liver biopsy were collected from 11 studies with 646 patients (Table 3). The Spearman correlation coefficient was − 0.109, p value was 0.749, indicating no threshold effect. The diagnostic odds ratio was 348.51 (95% CI 148.74–816.63), I2 was 0.0%, showing low heterogeneity among the studies.

The forest plot of the sensitivity and specificity of the diagnostic performance of percutaneous liver biopsy is shown in Figs. 17, 18. The sensitivities and specificities of individual studies varied from 90 to 100% and from 84 to 100%, respectively. The percutaneous liver biopsy showed pooled sensitivity of 98% (95% CI 96–99%), specificity of 93% (95% CI 89–95%), LR + of 12.09 (95% CI 8.28–17.63) and LR − of 0.03 (95% CI 0.02–0.06). The summary ROC curves of percutaneous liver biopsy for the diagnosis of biliary atresia are illustrated in Fig. 19. The summary ROC curve was symmetric, and the AUC was 0.9882, Q was 0.9543. The PPV is 93.0% and the NPV is 97.7%.

Fig. 17
figure 17

The forest plots of pooled sensitivity for percutaneous liver biopsy

Fig. 18
figure 18

The forest plots of pooled specificity for percutaneous liver biopsy

Fig. 19
figure 19

Summary receiver operating characteristics (SROC) curve of the percutaneous liver biopsy

Subgroup analyses

We performed subgroup analyses for B-US, MRCP and serum liver function test and the results are present in the Table 4. The heterogeneity of articles evaluated MRCP is caused by study design according to the results.

Table 4 Subgroup analyses of B-US, MRCP and serum liver function test

Publication bias

We constructed Deeks funnel plot to assess publication bias of the studies of B-US, MRCP, acholic stool, serum liver function test, hepatobiliary scintigraphy and percutaneous liver biopsy, there are no bias in all methods (the p values are 0.10, 0.97, 0.59, 0.87, 0.11, 0.09, respectively).

Discussion

We know that a good prognosis of Kasai portoenterostomy depends on early diagnosis and early Kasai operation. However, BA and other diseases causing cholestasis jaundice share a great deal of common ground on symptom and laboratory examination. None of early diagnosis method of BA is with accuracy of 100%, which leads to difficulty diagnosing BA within 2 months. Therefore in this meta-analysis, the studies evaluate several diagnosis methods are given precedence.

BA is diagnosed by intraoperative cholangiography with/without intraoperative liver biopsy finally in clinical practice. So even though the preoperative liver biopsy is the most accurate based on AUC, but it is not the method for final diagnosis of BA, just because it is not 100% accurate. In addition, it is invasive, leading to many complications. So in clinical practice, surgeons prefer to use noninvasive method for early diagnosis. Now that none of noninvasive method is with high sensitivity and specificity at the same time, maybe combination of a method with high sensitivity and another method with specificity is a good idea. So combination of MRCP/hepatobiliary scintigraphy (high sensitivity) and B-US/serum liver function (high specificity) is the best according to our data. But hepatobiliary scintigraphy is radioactive. Considering acholic stool is convenient and its sensitivity is acceptable, combination of MRCP/acholic stool and B-US/serum liver function test could be the first choice. But Ağın [42] reported that combination of B-US, acholic stool and GGT for diagnosis BA is with sensitivity of 55.9% and specificity of 95%, which is disappointing because of its low sensitivity.

Although sensitivity and specificity are direct index, they could be influenced by cutoff value. We can also use predictive value (PV) to find the best method. PV is an index that use test results to estimate the possibility of sick or health. So we can use a method with high PPV to make a definite diagnosis of BA firstly, and then a method with high NPV should be performed to exclude BA if cannot confirm. According to the criteria, combination with B-US (high PPV) and MRCP/acholic stool/ hepatobiliary scintigraphy (high NPV) is the best. Because of reason as above, maybe combination of B-US and MRCP/acholic stool is the first choice.

Besides, prevalence of disease may influence the performance index of diagnostic method. In term of prevalence, LR + and LR − are more stable than sensitivity, specificity, PPV and NPV. According to the thought, combination with B-US (high LR+) and MRCP/hepatobiliary scintigraphy (low LR −) could be the better choice. Because hepatobiliary scintigraphy is radioactive, so we can use a B-US make a definite diagnosis of BA firstly, and then MRCP is performed to exclude BA if cannot confirm. Sung [43] demonstrated that better diagnostic performance of US with MRCP for discrimination between BA and non-BA was achieved (sensitivity, specificity, accuracy, PPV and NPV are 98, 91, 95, 95, 95 and 98, 83, 92, 91, 95%, evaluated by two observer, respectively).

Certainly, we need more clinical studies to assess the combination strategy for diagnosing BA. If it remains a suspense, hepatobiliary scintigraphy is needed. Liver biopsy should be performed in most infants with undiagnosed cholestasis [44].

Limitations

Our study has several limitations. First, although there are not heterogeneities in some subgroups, other subgroups on the same covariate still show the heterogeneities or cannot be analyzed because of too few articles included. So maybe the heterogeneities are caused by other aspects. In fact, we wanted to add one more covariate of mean age of patients (≤ 60 versus > 60 days), whereas only a part of studies show the result. So we gave up and it was regarded as the greatest limitation of our meta-analysis. Certainly, we thought the difference of diagnosis test equipments maybe also cause the heterogeneities. Second, excluding non-English articles and absence of gray articles could cause bias. Third, we excluded all of articles with incorrect or insufficient data to construct diagnostic 2 × 2 table. We did not contact authors to obtain the raw data, which also lead to bias probably.

Conclusions

The results of this meta-analysis showed that the accuracy rate of percutaneous liver biopsy is better than all of the noninvasive methods. Take into consideration the advantages and disadvantages of the six methods, combination of multidisciplinary noninvasive diagnosis methods is the first choice for differential diagnosis of BA from other causes of neonatal cholestasis.