Introduction

Esophageal atresia (EA) was a congenital malformation that occurred one in 3000 in neonates [1]; more than 90% of EA patients were associated with a tracheoesophageal fistula (TEF) between the trachea and the esophagus [2].

Traditionally, EA was operated via a right posterolateral thoracotomy. Open repair (OR) for EA with TEF consisted of the isolation of the fistula, dissection of the upper pouch, mobilizing the lower pouch, and completing the anastomosis [2, 3]. The first thoracoscopic repair (TR) of pure EA was performed in 1999 [4], while the first successful TR of EA with TEF was reported 1 year later [5]. With these milestones, numerous children’s health centers started adopting TR for EA with TEF patients [6,7,8,9].

Although TR for EA with TEF patients was conducted by many advanced children’s medical centers, the safety and efficacy of TR for EA with TEF patients remained controversial. This meta-analysis aimed to evaluate the outcomes of OR and TR for EA with TEF patients, as well as to provide unambiguous evidence as to whether TR in the treatment for EA with TEF patients was feasible.

Materials and methods

Literature search

A systematic search of the PubMed, Cochrane Library, Medline for the relevant published studies compared the clinical outcomes of OR and TR for EA with TEF patients. The search strategy was (minimally invasive repair OR minimally invasive surgery OR thoracoscopic OR thoracoscopy) AND (open repair OR open surgery OR thoracotomy) AND (esophageal atresia OR EA) AND (tracheoesophageal fistula OR TEF). We contacted the original authors to obtain extra information through e-mail if necessary.

Study selection

A study was included in this systematic review when the following criteria were met: (1) observational studies (cohort or case–controlled studies) or randomized controlled trials (RCTs); (2) comparison of clinical outcomes between OR and TR for EA with TEF.

A study was excluded in this systematic review when the following criteria were met: (1) review, conference record, case report, and animal experiment; (2) study included EA without TEF patients; (3) multiple studies based on the same data.

Two reviewers (W.Y.H and K.H.Y) screened all the studies independently, and any disagreements on the eligibility of studies were resolved by discussion. We have double checked the literature search and the study selection, and excluded study that included EA without TEF patients.

Data extraction

Data were extracted by both reviewers independently, and then exchanged and checked for accuracy. The following information was extracted: (1) basic characteristics of included studies: first author, publication year, study district, study design, surgical approach, sample size, gestational age, birth weight, associated anomaly, and conversion rate; (2) clinical outcomes of both surgical approaches: operative time, length of hospital stay, first oral feeding time, the occurrence rate of leaks and strictures, pulmonary complications, fundoplication rate of GERD, blood loss, and ventilation time. In RCTs that contained multiple groups, only the experimental and control groups associated with EA with TEF patients were extracted.

Quality and level of evidence assessment

Quality and level of evidence assessment were performed by both reviewers independently, and any disagreements on the results of quality and level of evidence assessment were resolved by discussion.

For non-randomized controlled trials (NRCTs), we used Methodological Index for Non-Randomized Studies (MINORS) guidelines [10] to assess the methodological quality. MINORS guidelines contained 12 items (as shown below) for the comparative studies: (1) a clearly stated aim; (2) inclusion of consecutive patients; (3) prospective collection of data; (4) endpoints appropriate to the aim of the study; (5) unbiased assessment of the study endpoint; (6) follow-up period appropriate to the aim of the study; (7) loss to follow-up less than 5%; (8) prospective calculation of the study size; (9) adequate control groups; (10) contemporary groups; (11) baseline equivalence of groups; and (12) adequate statistical analysis. Every item has two scores and the total score is 24; high quality was indicated by the score ≥16 points [11]; otherwise, the quality was low.

For RCTs, we used the Cochrane collaboration’s tool [12] to provide the qualification of the risk of bias. This tool included six items as follows: (1) details of randomization method; (2) allocation concealment; (3) blinding of participants, personnel, and outcome assessment; (4) incomplete outcome data; (5) selective reporting; (6) other sources of bias.

We assessed the level of evidence using the Grades of Recommendation, Assessment, Development and Evaluation system (GRADE) [13]. The GRADEprofiler 3.6 software was employed. The gradation of quality used in the GRADE system included: (1) high quality: further research is very unlikely to change our confidence in the estimate of effect; (2) moderate quality: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate; (3) low quality: further research is extremely likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate; (4) very low quality: any estimate of effect is very uncertain.

Statistical analysis

All statistical analyses were undertaken using Stata 12.0 (StataCorp, TX), and P < 0.05 was considered statistically significant. For dichotomous and continuous data, odds ratio (OR) and standardized mean difference (SMD) were employed, respectively. The Cochrane Q test and the I 2 statistic were used to assess the heterogeneity between pooled studies, with I 2 > 50% indicating substantial heterogeneity. If the I 2 > 50%, a random-effects model of analysis was employed, and sub-group and sensitivity analysis were used to explore the sources of heterogeneity; otherwise, a fixed-effects model of analysis was employed. If only the median value and range were available, formulas provided by Hozo et al. [14] were used to estimate the mean values and standard differences. Funnel plot, Begg’s test, and Egger’s test were employed to assess the publication bias.

Results

Literature search and study selection

A total of 130 studies were obtained initially. After screening for duplicates in title and abstract, only 13 studies were available for the full-text evaluation for eligibility. Eventually, this meta-analysis was based on 10 studies which included 9 NRCTs and 1 RCT. The literature search and study selection have been double checked, and study which included EA without TEF patients has been excluded. The flowchart depicting the search strategy was shown in detail in Fig. 1.

Fig. 1
figure 1

Flow chart of process of literature screening for this meta-analysis

Characteristics of included studies and quality assessments

Of the 10 included studies, there were 9 NRCTs [15,16,17,18,19,20,21,22,23] and 1 RCT [24]. A total of 447 patients were involved in this study, of whom 217 were in the TR and 230 in the OR groups.

Since the data of RCT could not be pooled with NRCTs, we only did quality assessment and evaluated the risk of bias for the RCT. NRCTs were evaluated in accordance with the MINORs guidelines. Cochrane collaboration’s tool was employed to evaluate the risk of bias of involved RCT. The characteristics of included studies and quality assessments were shown in details in Table 1.

Table 1 Characteristics of included studies

Funnel plot of occurrence of leaks was employed to explore the publication bias (Fig. 2). A total of seven studies were included in the funnel plot and no significant publication bias was found (Begg’s test P = 1.0, Egger’s test P = 0.842).

Fig. 2
figure 2

Funnel plot of occurrence of leaks

Occurrence rate of leaks

A total of seven non-randomized concurrent controlled trials (NRCCTs) and one historical controlled trial (HCT) calculated the occurrence rate of leaks and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 3.35, P = 0.764, I 2 = 0, which indicated no significant heterogeneity, and a fixed-effects model was employed. This meta-analysis result indicated that compared with OR, TR did not increase the occurrence rate of leaks (OR, 1.747; 95% CI 0.817–3.737; P = 0.15) (Fig. 3).

Fig. 3
figure 3

Meta-analysis of occurrence of rate of leaks

Occurrence rate of strictures

A total of six NRCCTs and one HCT calculated occurrence rate of strictures and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 8.15, P = 0.227, I 2 = 26.4%, which indicated no significant heterogeneity, and a fixed-effects model was employed. This meta-analysis result indicated that compared with OR, TR did not increase the occurrence rate of strictures (OR 0.937; 95% CI 0.5–1.757; P = 0.839) (Fig. 4).

Fig. 4
figure 4

Meta-analysis of occurrence of rate of strictures

Fundoplication rate of post-operative gastroesophageal reflux disease (GERD)

A total of three NRCCTs and one HCT calculated fundoplication rate of post-operative GERD and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 1.86, P = 0.601, I 2 = 0%, which indicated no significant heterogeneity, and a fixed-effects model was employed. This meta-analysis result indicated that compared with OR, TR did not increase the occurrence rate of fundoplication rate of post-operative GERD (OR 1.642; 95% CI 0.855–3.153; P = 0.601) (Fig. 5).

Fig. 5
figure 5

Meta-analysis of fundoplication rate of post-operative GERD

Occurrence rate of pulmonary complications

In general, the pulmonary complication after surgical repair that we included consisted of repeated pneumonia, atelectasis, pneumothorax, and pleural empyema. A total of four NRCCTs calculated the occurrence rate of pulmonary complication and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 7.11, P = 0.069, I 2 = 57.8%, indicated significant heterogeneity, and a randomized-effects model was employed (Fig. 6). This meta-analysis result indicated that compared with TR, OR did not increase pulmonary complication (OR 1.08; 95% CI 0.21–5.44; P = 0.897).

Fig. 6
figure 6

Meta-analysis of occurrence rate of pulmonary complication

Post-operative ventilation time

A total of six NRCCTs analyzed post-operative ventilation time of both surgical approaches and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 14.17, P = 0.015, I 2 = 64.7%, indicated significant heterogeneity, and a randomized-effects model was employed (Fig. 7). This meta-analysis result indicated that compared with TR, OR did not increase post-operative ventilation time (SMD 0.474; 95% CI 0.02–0.968; P = 0.06).

Fig. 7
figure 7

Meta-analysis of post-operative ventilation time

Since there existed significant heterogeneity, the sensitivity analysis was employed and revealed a significant change of the pooling results after excluding Matsunari et al.’s [21] study (Fig. 8). We did another meta-analysis excluding Matsunari et al. [21] study, and we had a significantly substantial change in pooling SMD that compared with TR, OR increased post-operative ventilation time (SMD 0.61; 95% CI 0.16–1.07; P = 0) (Fig. 9).

Fig. 8
figure 8

Sensitivity analysis of post-operative ventilation time

Fig. 9
figure 9

Meta-analysis of post-operative ventilation time excluded Matsunari et al. study

First oral feeding time

The first oral feeding time was defined as the time at which first oral feeding was administered after surgery, and it was clearly indicated in three studies [18, 19, 23]. Therefore, the three NRCCTs mentioned above were included in this meta-analysis. Heterogeneity test revealed χ 2 = 0.97, P = 0.614, I 2 = 0%, which indicated no significant heterogeneity, and a fixed-effects model was employed. This meta-analysis result indicated that compared with OR, TR did not increase the first oral feeding time (SMD 0.652; 95% CI 0.27–1.035; P = 0. 001) (Fig. 10).

Fig. 10
figure 10

Meta-analysis of first oral feeding time

Length of hospital stay

A total of three NRCCTs analyzed the length of hospital stay of both surgical approaches and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 3.24, P = 0.198, I 2 = 38.3%, which indicated no significant heterogeneity, and a fixed-effects model was employed. This meta-analysis result indicated that compared with TR, OR increased the length of hospital stay (SMD 0.584; 95% CI 0.214–0.953; P = 0.002) (Fig. 11).

Fig. 11
figure 11

Meta-analysis of hospitalization time

Blood loss

A total of four NRCCTs analyzed the blood loss of both surgical approaches and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 35.62, P = 0, I 2 = 91.6%, which indicated significant heterogeneity, and a randomized-effects model was employed (Fig. 12). This meta-analysis result indicated that compared with OR, TR did not increase post-operative ventilation time (SMD 0.048; 95% CI −1.292 to 1.388; P = 0.944).

Fig. 12
figure 12

Meta-analysis of blood loss

Since significant heterogeneity existed, the sensitivity analysis was employed and revealed a significant change of the pooled results after excluding Koga et al.’s [18] study (Fig. 13). We did another meta-analysis excluding Koga et al. [18] study, but we did not have a significantly substantial change in pooled SMD (SMD 0.60; 95% CI −0.38–1.57; P = 0.233) (Fig. 14).

Fig. 13
figure 13

Sensitivity analysis of blood loss

Fig. 14
figure 14

Meta-analysis of blood loss excluding Koga et al.’s study

Operative time

A total of seven NRCCTs analyzed the operative time of both surgical approaches and they were included in this meta-analysis. Heterogeneity test revealed χ 2 = 3.73, P = 0.713, I 2 = 0%, indicated no significant heterogeneity, and a fixed-effects model was employed. This meta-analysis result indicated that compared with OR, TR increased the hospitalization time (SMD 0.604; 95% CI 0.344–0.864; P = 0) (Fig. 15).

Fig. 15
figure 15

Meta-analysis of operative time

GRADE evaluation for the level of evidence

The included observational studies and RCT had the same three outcomes including occurrence rate of leaks, occurrence rate of strictures and operative time (Table 2).

Table 2 GRADE evaluation of included studies

Discussion

The advantages of TR were obvious, including excellent visualization and dissection of the posterior mediastinal structures [25], less use of post-operative narcotic [15, 18, 19, 24], and cosmetic outcomes. In 1985, Jaureguiza et al. [26] reported “winged scapula”, chest wall deformity, scoliosis, and mammary mal-development in 89 patients who underwent OR for EA with TEF and have been followed up for longer than 3 years. With OR, lung retraction was required to expose the posterior mediastinum, thus resulting in lung damage and respiratory-related complication [18].

An increasing number of surgeons started employing TR for the EA with TEF, but not all the patients were good candidates for such repair. TR was not suitable for patients afflicted with severe illness or major cardiac anomalies. Rothenberg [27] reported that absolute contraindications to a thoracoscopic approach were severe hemodynamic instability requiring significant ventilation support and significant prematurity (birth weight <1500 g). Relative contraindications were significant congenital cardiac defects, small weight (1500–2000 g), or significant abdominal distension. Yamoto et al. [23] reported that the two criteria for TR were birth weight >2000 g and with the absence of severe cardiac malformations and chromosomal aberrations. Holcomb GW et al. [7] also reported that it was difficult to perform endoscopic repair in patients weighing less than 2 kg and in patients with significant lung disease. In summary, it was more reliable to employ thoracoscopic repair in patients weighing more than 2 kg and with the absence of severe associated anomalies.

The most important short-term and long-term outcomes of EA with TEF after surgical repair were the leaks and strictures of anastomosis, respectively. In this study, meta-analysis revealed that compared with OR, TR did not increase the rate of occurrence of leaks and strictures. The pooling results were similar to those of the previous meta-analysis [25, 28].

This was the first study to pool the rate of pulmonary complication and fundoplication of post-operative GERD. Meta-analysis found insignificant difference in both of them. In general, the difference of pulmonary complication remained insignificant, but two studies in particular [17, 18] revealed higher rate of occurrence of repeated pneumonia and atelectasis after open repair, while two other studies [21, 22] found that pneumothorax and pleural empyema were more common in the TR group due to the transpleural access and artificial pneumothorax which were established during TR. Kawahara et al. [17] reported that TR did not significantly decrease the occurrence of subsequent GERD, nor reduce the disturbance of esophageal motor function which was comparable to the pooling results. The possible mechanism might be esophageal motor function resulted more from an inherent abnormal innervation than from intraoperative denervation [29].

The first oral feeding usually started 7 days after surgery when an esophagogram was performed to confirm the integrity of the anastomosis and the absence of spillage of contrast medium [2]. Our meta-analyses revealed that, compared with OR, TR reduced the first oral feeding time and length of hospital stay significantly. These pooling results indicated that a smoother recovery after thoracoscopic repair was observed. However, this result might be affected by selection bias, since surgeon preferred OR on more severe patients.

Meta-analysis of operative time showed a shorter operative time was observed in the OR group, which was probably associated with the learning curve and intracorporeal knotting maneuver [30]. Long distance between the proximal and distal pouch was also associated with longer operative time. Besides, the narrow operating field and unsatisfactory exposure could increase the operative time of TR as well. However, the TR technique was rapidly developed in last 20 years. Rothenberg et al. [1] reported their 10-year experience of TR for EA in 2014, demonstrating a much shorter average operative time than the series reported by Nguyen et al. [30] over 15 years ago. Longer operative time in TR led to more CO2 absorption in blood due to artificial pneumothorax, but a few studies [18, 20, 23, 24] reported that TR was not associated with hypercapnia and acidosis postoperatively. Since we had only included one study [24] which analyzed the data of intraoperative hypercapnia and acidosis, meta-analysis of data of intraoperative blood gas was absent.

Primarily, a total of six studies were included in the postoperative ventilation analysis and they showed insignificant difference between two approaches. However, sensitivity analysis revealed that after excluding one study [21], the heterogeneity was lower but had a substantial change in pooling result. As a result, the primary result which indicated insignificant difference between two approaches remained unstable. Further analysis was carried out to determine how the excluded study affected pooling results significantly. Only Matsunari et al. [21] reported longer postoperative ventilation time in TR, which probably was caused by its smallest sample size as compared with the other three studies that we have included. Heterogeneity was also found in the analysis of blood loss. The absence of significant change to the pooling results after sensitivity analysis indicated that the primary pooling result was stable and trustworthy.

We also assessed the level of evidence using the GRADE system. According to the GRADE system, the quality of the evidence was only low (for the first two outcome indicators) and very low (for third outcome indicators) due to the limited evidence derived from combined NRCCTs. The quality of evidence for RCT was generally high, but except for this RCT which showed significant limitations, because allocation concealment and blinding were absent. Hence, the RCT was only of moderate quality.

The latest and comparable systematic review [28] which published in 2016 documented the shortcomings of this meta-analysis as follows: (1) incomplete retrieval of literatures might have caused selection bias; (2) study [31] had included type A of EA patients in their meta-analysis; (3) failure to assess the quality of RCT and lack of risk bias evaluation; (4) failure to explain how time to first oral feeding was decided; and (5) failure to conduct sensitivity analysis of strictures (when moderate heterogeneity existed).

Our study overcame the shortcomings of the previous meta-analyses. We have double checked the literature search and the study selection, eventually two more appropriate studies [15, 21] were included and two studies that included EA without TEF patients [31, 32] were excluded in our study. We have evaluated RCT and the observational studies separately and used appropriate criteria, respectively. We conducted meta-analyses of the observational studies and performed quality analysis for the only one RCT. We employed GRADE system to assess the quality of evidence with the standard classification. We have defined and clarified how the first oral feeding time was decided and conducted sensitivity analysis when I 2 > 50%. And we also summarized the indications of TR for EA with TEF. However, there were still several limitations in our study. First, we included only one RCT, and hence could not perform high-quality meta-analysis of RCT. Second, only seven studies were included in the funnel plot; therefore, it was highly probable that the result of publication bias test remained inaccurate; and third was the lack of long-term follow-up data. Only four studies [17, 19, 23, 24] mentioned follow-up data in excess of 1 year, so the occurrence of long-term complication (e.g., strictures, GERD) remained unknown. Finally, most of the studies that we included had small sample sizes and were observational studies.

In conclusion, compared with OR, TR significantly reduced the length of hospital stay and first oral feeding time. However, TR was associated with longer operative time. The rate of occurrence of leaks, strictures, pulmonary complication, fundoplication rate of GERD, and blood loss were similar between the two surgical approaches. The primary result of meta-analysis of ventilation time showed similar outcome between the two surgical approaches, but the result remained controversial due to the fluctuating result of the sensitivity analysis. Based on the GRADE system, the recommended level was only C. Since only one RCT analyzed the clinical outcomes of both surgical approaches, multi-center, larger sample size RCTs should be designed to explore the differences of clinical outcomes between TR and OR for EA with TEF. In addition, a longer follow-up period is necessary for evaluating the long-term complication after surgery.