Introduction

Esophageal cancer is the ninth-most common cancer and the sixth-most common cause of cancer-related death globally [1]. For resectable esophageal cancer, esophagectomy has remained the mainstay treatment. Since Cuschieri et al. reported the first thoracoscopic esophagectomy in 1992 [2], it has been rapidly spreading along with improvements in surgical techniques and equipment. In Japan, it was reported that thoracoscopic and/or laparoscopic esophagectomy was performed in 74.0% of superficial esophageal cancers and 55.3% of advanced esophageal cancers in 2017, and the number of these cases has been increasing for a number of years [3]. Minimal invasiveness was thought to be achieved by minimizing chest wall destruction while performing sophisticated surgical manipulation with magnified visualization.

The current conclusions regarding the safety and effectiveness of thoracoscopic esophagectomy have been based on one randomized controlled trial (RCT) (TIME trial) [4] and numerous retrospective cohort studies from limited institutes (CLIs). The RCT revealed that the incidence of pneumonia in the minimally invasive esophagectomy group was significantly lower than that in the open esophagectomy group. Similar benefits were detected in CLIs [5,6,7,8,9,10]. However, it should be noted that many of these studies were reported from high-volume centers that treated large numbers of esophageal cancer patients.

In recent years, thoracoscopic esophagectomy has become popular, not only in high-volume centers but also in low-volume hospitals. The authorized Institutes for Board Certified Esophageal Surgeons (AIBCESs) in Japan require that more than 50 surgeries for esophageal disease be performed within a period of 5 years (about 10 esophageal surgeries per year) [11]. Thoracoscopic surgery was reportedly performed in 56.2% of non-AIBCESs and 66.1% of AIBCESs [12]. However, the overall outcome in clinical practice is unclear. Surgeons who perform thoracoscopic esophagectomy must ensure a good view in the limited space of the thoracic cavity. In addition, because they are forced to use intercostal port, the movements of surgical instruments are restricted to some extent compared to open surgery; thus, it is essential to standardize procedures as a team in each institute. For this purpose, some fixed doctors and large numbers of target patients are required. As such, inter-institutional disparity may exist in the short-term outcomes of thoracoscopic esophagectomy. It is unclear whether or not widespread thoracoscopic esophagectomies performed by diverse surgeons at various facilities in clinical practice will actually benefit patients, similar to the positive results from RCTs and CLIs.

It has been said that “thoracoscopy suppresses the occurrence of pneumonia in comparison to thoracotomy”, but does this reflect real clinical practice? In recent years, the importance of big data has been recognized [13], and the results of many retrospective cohort studies based on nationwide databases (CNDs) have been reported.

To answer the clinical question described above, we examined the hypothesis that the results comparing the incidence of pneumonia after thoracoscopic esophagectomy to open esophagectomy differed by study design. Specifically, using a systematic review and meta-analysis, we compared the results of RCTs and CLIs in which a large number of high-volume centers were the main participants to the results of CNDs in which both high-volume centers and low-volume hospitals participated. The systematic review and meta-analysis were prepared by collecting the data listed in the esophageal cancer practice guidelines 2017 by the Japan Esophageal Society [14, 15]. There have been no reports on systematic reviews comparing these three types of research formats.

Methods

The present systematic review and meta-analysis was conducted to compare the short-term outcomes of thoracoscopic and open esophagectomy for esophageal cancer in the above-mentioned research formats, based on the recommendations of the PRISMA statement [16].

Search strategy

Using the electronic Pubmed, Cochrane Library and Igaku Chuo Zasshi databases, we identified all comparative studies published from January 1, 1995, to September 26, 2019, that compared thoracoscopic and open esophagectomy. The following search terms were used: (((“esophageal neoplasms/surgery” OR “esophagectomy/methods” OR “Esophagectomy/mortality”) AND ((“minimally invasive” OR “thoracic surgery, video-assisted”) OR “thoracoscopy”)) AND ((((((((“risk factors” OR “risk assessment”) OR “mortality”) OR “cohort studies”) OR “prognosis”) OR “survival analysis”) OR “Outcome Assessment (Health Care)”) OR “morbidity”) OR “analysis of variance” OR “quality of life” OR systematic OR meta-analysis)).

Study selection

Many studies on minimally invasive esophagectomy (MIE) have been reported. However, in these studies, MIE included various surgical procedures, such as thoracoscopic esophagectomy, laparoscopic transhiatal esophagectomy, and laparoscopic gastric conduit construction. In the present study, we limited the minimally invasive surgical procedure to thoracoscopic esophagectomy.

The inclusion criteria were as follows: (1) randomized controlled trials, prospective or retrospective cohort studies, (2) comparing the post-operative complications of thoracoscopic esophagectomy to open esophagectomy, (3) > 95% of participants diagnosed with esophageal cancer, and (4) performance of Ivor-Lewis or McKeown esophagectomy. In addition, we allowed open left transthoracic esophagectomy.

The exclusion criteria were as follows: (1) Studies not comparing thoracoscopic surgery to open surgery, (2) Studies with overlapping subjects, (3) Studies with inappropriate subjects, (4) Studies with inappropriate surgical techniques, (5) Studies with inappropriate endpoints, (6) Review articles without new results, (7) Protocol papers, (8) Studies without peer review, (9) Studies without reliable contents, and (10) Studies presented in languages other than English.

Data extraction

Data extraction was performed in duplicate by two investigators (K.M. and M.U.), using the Microsoft Excel software program (Microsoft Corporation, Redmond, WA, USA). First, the investigators checked all titles and abstracts and selected candidate studies. Second, they read the whole manuscripts of these studies and selected eligible studies according to the above-mentioned inclusion and exclusion criteria. These tasks were separately performed by the two investigators, and the selected studies were finally integrated. In cases of disagreement between the two investigators, a third investigator (Y.A.) was asked to check, and the study was discussed until a consensus was reached regarding its eligibility. Afterwards, the data, including the study name, year of publication, study design, the number of patients, coverage period, surgical technique, 30-day mortality, morbidity, and the incidence of pneumonia, anastomotic leakage and recurrent nerve paralysis, were extracted.

Statistical analyses

All statistical analyses were performed using the Review Manager (RevMan) software program (version 5.4.1, The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, Denmark).

All of the statistical results were determined using random-effect models because it was thought that the true outcome of each study might vary in clinical observational studies. The I2 test was performed to assess statistical heterogeneity, and publication biases were analyzed using a funnel plot.

Quality assessments

Each study was reviewed separately by two independent investigators (K.M. and M.U.), based on the methods described in the Cochrane Handbook for Systematic Reviews of Interventions [17].

Results

Study characteristics (eligible studies)

The process of the literature search is illustrated as a flow chart in Fig. 1. Initially, a total of 1186 studies were identified through 3 electronic database searches. In the first screening, 921 studies with subjects and abstracts that were incompatible with the purpose of this study were excluded. In the second screening, 222 studies were excluded based on the above-mentioned criteria. Thus, 43 studies, including 1 RCT, 38 CLIs, and 4 CNDs, were included in the analysis of the present study (Table 1).

Fig. 1
figure 1

Flow chart of the literature search. RCT, randomized controlled trial; CLI, retrospective cohort study from limited institutes; CND, retrospective cohort study based on a nationwide database

Table 1 Overall characteristics of the studies

The number of participants in each study category was 115 in RCTs, 6,126 in CLIs, and 14,816 in CNDs. There were three other RCTs about MIE, but one was a MIRO trial, in which the definition of hybrid MIE was laparoscopic mobilization and open right thoracotomy. Thoracoscopic surgery was not included in this procedure [18]. This was one of the exclusion criteria in our study. The other two RCTs were trials for which the final results had not been obtained (ROMIO study [19, 20] and JCOG1409 [21]). Thus, these three trials were excluded. We also extracted four CNDs, and the patients who underwent transthoracic esophagectomy between 2008 and 2011 in the United States [22], 2011 and 2015 in the Netherlands [23], 2011 and 2012 in Japan [24], and 2007 and 2014 in Finland and Sweden [25] were included.

The short-term outcomes are shown in Fig. 2.

Fig. 2
figure 2

Forest plots of the odds ratio for each complication comparing thoracoscopic esophagectomy with open esophagectomy. RCT, randomized controlled trial; CLI, retrospective cohort study from limited institutes; CND, retrospective cohort study based on a nationwide database; CI, confidence interval

The 30-day mortality

The same results were demonstrated in all study designs. No significant difference in the 30-day mortality between the thoracoscopic and open groups was observed in the RCT (p = 0.59), CLI (p = 0.33), or CND meta-analyses (p = 0.26) (Fig. 2A).

Morbidity

Although a significant difference was observed in the CLI meta-analysis (p = 0.003), there was no marked difference with the CND meta-analysis (p = 0.84). In the CLI meta-analysis, the morbidity rate in the thoracoscopic group was lower than that in the open group (Fig. 2B).

Pneumonia

We initially investigated the definition of pneumonia in the 29 included articles. However, only 7 [4, 6, 9, 22, 24, 26, 27] of the 29 articles described the definition of pneumonia. In these 7 studies, pneumonia was defined by a combination of radiographic appearance using chest X-ray or computed tomography (CT) findings, the clinical appearance (e.g. a high fever), and blood test results, such as an increase in the C-reactive protein levels or white blood cell count. A positive sputum culture was essential in only one study [4].

To define the degree of pneumonia, the Common Terminology Criteria for Adverse Events (CTCAE) version 3.0 was adopted in one study (higher than grade 2) [28], and the Clavien–Dindo classification was adopted in three studies (grades 1–2 in one study [10], higher than grade 1 in one study [29], and higher than grade 2 in one study [30]). Further details were unclear, but it was speculated that pneumonia requiring antibiotics was the subject in most included studies.

The results regarding pneumonia were similar to those regarding morbidity. Although there was a significant difference between the RCT (p = 0.005) and CLI meta-analyses (p = 0.003), there was no marked difference in the CND meta-analysis (p = 0.69). In the RCT and CLI meta-analyses, the incidence of pneumonia in the thoracoscopic group was lower than that in the open group (Fig. 2C).

Anastomotic leakage

The results were similar among groups with no significant differences (RCT: p = 0.39, CLI: p = 0.95, CND: p = 0.10) (Fig. 2D).

Recurrent nerve palsy

The results regarding recurrent nerve paralysis differed in all groups. The incidence in the thoracoscopic group was lower than that in the open group in the RCT (p = 0.012), the same frequency was observed in the CLI meta-analysis (p = 0.33), and the incidence in the thoracoscopic group was greater than that in the open group in the CND meta-analysis (p = 0.002) (Fig. 2E).

The I2-values in two meta-analyses concerning the 30-day mortality were very stable at 0%. However, the I2-values in the CLI and CND meta-analyses for morbidity were 45% and 63%, respectively. Regarding pneumonia, the rates in the CLI and CND meta-analyses were 47% and 52%, respectively. These findings suggest that these studies tended to have moderate heterogeneity.

Funnel plots showed no significant publication bias in the 30-day mortality, morbidity, anastomotic leakage, or recurrent nerve palsy (Fig. 3). However, in pneumonia, the lower right dots were missing, and an asymmetric inverted funnel shape was shown. Thus, the results concerning this complication might have been affected by publication bias.

Fig. 3
figure 3

Funnel plots showing the publication bias for each complication. CLI, retrospective cohort study from limited institutes

Discussion

The frequency of pneumonia, the most notable complication, was significantly lower in the thoracoscopic group in the RCT and the CLI meta-analysis, while no significant difference was noted in the CND meta-analysis. This suggests that the short-term outcomes after thoracoscopic esophagectomy, especially the incidence of pneumonia, might differ among institutes.

Thoracoscopic esophagectomy was expected to reduce general complications because it involved less chest wall destruction and was associated with reduced chest wound pain [31, 32]. The rate of pneumonia, one of the most important complications after esophagectomy, was expected to decrease. However, while the RCT and the CLI meta-analysis showed a significant decline in the pneumonia rate in the thoracoscopic group, no marked difference was found in the CND meta-analysis. This was the most important finding of the present study. The institutes in which the RCTs and CLIs were performed included many high-volume centers. Then, how about the institutes participating in CNDs? In the Society of Thoracic Surgeons (STS) National Database from the United States, 63 institutions registered 800 cases of minimally invasive esophagectomy (MIE) from 2008 to 2011. On average, the number of the cases per institution was less than 4.31 cases per year. Only 2 of these centers performed at least 20 minimally invasive resections per year, while the vast majority performed between 1 and 10 per year between 2009 and 2011 [22]. In the National Clinical Database (NCD) from Japan, 864 institutions registered 3589 MIE cases from January 1, 2011 to December 31, 2012. On average, the number of cases per institution was only 2.08 cases per year [24]. Considering the number of registered cases in high-volume centers with more than 20 cases per year, it was likely that a significant number of low-volume hospitals were involved. In the Dutch Upper Gastrointestinal Cancer Audit (DUCA) database, a relatively large number of high-volume centers participated. The rate of hospitals with 0–20 esophageal resection procedures per year was 2.4%, while that with 21–40 esophageal resection procedures per year was 43.2%. However, the exact number of participating facilities has not been announced [23]. Conversely, in Finland and Sweden, many low-volume centers were involved. Eleven institutions registered 217 cases of minimally invasive esophagectomy (MIE) from January 2007 to October 2014. On average, the number of cases per institution was about 2.82 cases per year [25]. The superior outcomes of thoracoscopic esophagectomy that were reported in the RCT and CLI meta-analyses and not in the CND meta-analysis might therefore be attributed to differences in the participating institutions. However, based on the results of the funnel plot analysis, there may have been some publication bias in the CLI meta-analysis with regard to pneumonia. For this reason, we recognized that this CLI meta-analysis should just be used for reference purposes.

The quality of thoracoscopic esophagectomy performed in clinical practice, including many low-volume hospitals, was a mixture of good and bad, and it was hypothesized that this heterogeneity affected the short-term outcomes, such as the incidence of pneumonia. These results suggest that, to achieve the maximum benefit from the minimal invasiveness of thoracoscopic esophagectomy requires sufficient surgical skill, based on a great deal of experience. Thus, the surgeons and their affiliated institutes must accumulate a great deal of experience and endeavor to master and standardize thoracoscopic procedures.

The 30-day mortality rate did not differ with any of the study designs. The results in relation to morbidity were the same as those in pneumonia. There was no difference in anastomotic leakage in any study design, suggesting that it was not influenced by differences in the surgical approach. The results in relation to recurrent nerve palsy differed in each of the study designs. There was a possibility that the results depended on the country in which the study was conducted. The emphasis on the dissection of the superior mediastinal lymph nodes, particularly the lymph nodes around the recurrent laryngeal nerve, varied from country to country. Especially in Japan, which was regarded as the most important base for lymph node dissection, an improvement in the prognosis was observed [33, 34]. The results of the CND meta-analysis were presumed to reflect this.

The RCTs were strictly managed and conducted, and the patients and doctors who participated in this clinical trial were narrowed down to ensure their quality [35, 36]. These processes gave statistical support. The results obtained through these methods were statistically superior, reliable, and extremely important. For this reason, the RCTs were positioned at the top of the hierarchy of evidence [36], and the results have been used as evidence for determining recommendations in many guidelines. However, we must be aware that the meaning of the results obtained from RCTs testing new surgical procedures differs from that of results obtained from RCTs testing new drugs, such as anticancer or antihypertensive drugs. No matter who administers the drug, the results are almost the same. However, the same cannot be said for surgical procedures, and even if the surgical procedure itself is excellent, the results will often depend on the surgeon performing it. This may dissociate the quality of a widely practiced surgical approach from the assessment of the procedure itself, which is verified in RCTs. As the technical difficulty of surgery increases, the more likely the quality of a widely practiced surgical approach will vary. The findings of RCTs are obtained from high-volume surgeons at high-volume centers, which do not necessarily reflect the results at the population level [25]. Given that studies related to surgical procedures differ from those targeting anticancer agents, care must be taken when interpreting the findings of studies examining the usefulness of high-level surgical procedures. While the results of RCTs targeting high-level surgical procedures should be generalized, we should not apply them to all cases.

In conclusion, unlike RCTs and CLIs, CNDs did not show the superiority of thoracoscopic surgery in terms of post-operative pneumonia. RCTs and CLIs were predominantly performed by high-volume hospitals, while CNDs were often performed by low-volume hospitals. In actual clinical practice including various types of hospitals, the superiority of thoracoscopic over open esophagectomy regarding the incidence of pneumonia may therefore decrease.

Limitations

The present study is associated with four main limitations.

First, in the meta-analysis on pneumonia, a publishing bias might have existed. However, the present study was a meta-analysis of observational cohort studies, so the possibility of some publishing bias was considered to be an overall limitation associated with the present study. Large-scale verification in future RCTs is expected. Second, only two RCTs compared the short-term outcomes of thoracoscopic and open esophagectomy. When the final results of the ROMIO study and JCOG 1409 are reported in the future, the conclusion of this study may change. Third, we could only obtain CND results from five countries: the USA, Netherlands, Japan, and Finland/Sweden. Although there was also a national database in the UK, there were no appropriate reports that fit the criteria of this study. Similarly to the situation regarding RCTs, the conclusions may change if the number of reports derived from CNDs increases. Fourth, it could not be ruled out that the abdominal surgical procedure influenced the development of pneumonia in the present study. For example, 29 studies discussing pneumonia were extracted in the present study, with only 13 describing abdominal procedures in detail. Therefore, it is difficult to discuss the effect of abdominal procedures on the development of pneumonia accurately. However, in 10 of the above-mentioned 13 studies, both thoracoscopic and laparoscopic procedures in the thoracoscopic group and both thoracotomy and laparotomy procedures in the open group were adopted. Considering that similar thoracic and abdominal approaches were used in at least 10 of the 29 studies, the abdominal procedure might have affected the outcomes in each study. Finally, the results of the analyses performed in this study may change over time. If all surgical teams acquaint themselves with thoracoscopic procedures, there may be a significant change in the incidence of post-operative pneumonia, even in CND-based studies. This is an issue that should be evaluated going forward.