Achalasia is an esophageal motility disorder manometrically characterized by a non-relaxing lower esophageal sphincter (LES) in association with aperistalsis of the esophageal body. It is a rare condition, with an incidence of 0.3–1.63–100,000 adults/year and prevalence of 8.7–10.8–100,000 [1, 2]. Although multiple theories (viral, inflammatory, autoimmune) targeting esophageal ganglion cells have been studied, the cause of neuronal degeneration in the esophageal wall remains unknown.

Achalasia was recognized more than 300 years ago as a disease that impairs the ability to swallow. At that time, the proposed treatment consisted of traumatic dilation of the gastroesophageal junction using a whalebone [3]. In 1913, Ernest Heller performed the first successful operation for achalasia describing an anterior and posterior lower esophageal myotomy through a laparotomy. The surgical technique evolved into a single anterior myotomy, through a left posterior thoracotomy resulting in a symptomatic success rate of 60–94 % [4]. Nonetheless, the relatively high morbidity of this approach made it less attractive than dilation techniques until the advent of the minimally invasive surgical approach in the early 1990s [5, 6].

Despite the fact that surgical technique has changed, the target of reducing the LES gradient pressure remains the same. Furthermore, the extension of the myotomy 3 cm onto the proximal stomach in order to divide the gastric sling fibers to further decrease LES pressure has been associated with improved results. The resulting gastroesophageal reflux is addressed with an antireflux procedure [7].

Laparoscopic Heller myotomy (LHM) is considered the standard of care since it provides superior and long-lasting symptom relief for patients with achalasia compared to other modalities of treatment, reserving endoscopic treatments to frail and/or aged patients or those who refuse to have an operation [812]. However, advances in endoscopic procedures which are even less invasive than laparoscopy have brought a novel endoscopic option named peroral endoscopic myotomy (POEM) into the discussion.

The use of POEM for the treatment of achalasia was first reported in 1980. This technique was described by Ortega et al. [13] as a mucosal and circular muscle myotomy that was performed above the gastroesophageal junction (GEJ) and using an endoscopic knife developed by the same group. Nevertheless, it was not universally accepted due to concerns of high risk of perforation. In 2007, Pasricha et al. [14, 15] reported the feasibility of performing endoscopic esophageal myotomy in four pigs by creating a submucosal esophageal tunnel; however, it was not until 2010 when the first human study was published. Since then, the procedure has expanded and is becoming more widely adopted by gastroenterologists and surgeons.

With the increased interest in replacing LHM with the less invasive POEM over the last 5–6 years, it is essential to assess the available efficacy and safety data of POEM to guide management decisions and future research. We aim to evaluate the effect of POEM in reducing LES resting pressure and symptom improvement.

Methods

Protocol

A systematic search was performed in Medline, Embase, Cochrane Central Register, and PubMed to identify all citations investigating the efficacy and or safety of POEM to treat patients with achalasia until February 2015. Our search strategy is shown in Appendix 1 in electronic supplementary material. Keywords including esophageal achalasia, peroral endoscopic myotomy, endoscopy, natural orifice surgery, laparoscopic Heller myotomy, and related terms were used. Studies using human subjects with no ethnicity, age, or language restrictions were included. Narrative letters or reviews and studies reporting less than 10 subjects were excluded for full-text eligibility.

After full-text analysis, we also excluded studies with less than 3 months of follow-up, repetitive data, and lack of LES resting pressure and Eckardt score. The systematic review was preformed following the (PRISMA) guidelines [16].

Articles in non-English language were translated with SYSTRAN software. At the time of the search, no published RCT was found. Thus, we consulted clinicaltrials.gov, confirming the status of “recruiting patients” for two ongoing randomized trials comparing POEM versus LHM (identifiers: NCT02138643 and NCT01601678). Additional records were obtained from bibliography and abstracts searched from the Society of American Gastrointestinal and Endoscopic Surgeons and Digestive Disease conferences between 2010 and 2015.

Two authors reviewed all identified citations from the literature search. Those studies that aim to explore POEM efficacy using Eckardt score and/or LES pressure and those reporting complications were selected for full-text review to determine whether they met inclusion criteria. A third reviewer resolved all disagreements. A data abstraction form was pilot-tested and approved by all authors. Two authors performed data extraction independently.

We explored heterogeneity among studies to determine the appropriateness and validity of pooling evidence across the studies using meta-analysis. Clinical and methodological heterogeneity was assessed through a careful examination of study, intervention, and patient characteristics. Statistical heterogeneity was examined (and quantified) using the I 2 statistic. Analysis was done for preoperative and postoperative data separately as well as for mean change between pre- and postoperation data. In the analysis involving mean change, complete data including standard error of mean change were not provided in all the studies. When incomplete, correlation between pre- and postoperative outcomes was calculated from studies with complete data. Standard error of the mean change was, consequently, imputed using the standard errors of preoperative and postoperative mean values and correlation estimates from similar studies. NIH tools and guidelines for case series, case control, and before and after studies were independently and dually utilized to assess risk of bias (18). Due to marked variability in study design, quality of the data, participants, and outcome measures as well as the observed high statistical heterogeneity, study-level estimates are presented in forest plots without pooling the estimates across the studies. The heterogeneity estimates are also presented.

Types of intervention

Only the intervention of peroral endoscopic myotomy (POEM) as described by Inoue et al. [15] was included. This procedure follows four consecutive steps: (a) mucosal incision and entry in the submucosa; (b) creation of the submucosal tunnel; (c) esophageal myotomy; (d) closure of the mucosal incision.

The laparoscopic Heller myotomy with fundoplication which included the following main steps: (a) mobilization of the gastric fundus and mediastinal esophagus; (b) esophagogastric myotomy; (c) partial fundoplication (Dor or Toupet).

Outcome

The outcomes of interest were efficacy and safety. Efficacy was evaluated as improvement in symptom scores and LES pressure profile measured by manometry. Eckardt score is the most commonly used symptom score for assessing the severity of achalasia [17]; it is the sum of the scores for dysphagia, regurgitation, and chest pain on a scale from 0 to 3 (0 = absent, 1 = occasional, 2 = daily, 3 = each meal) and weight loss (0 = no weight loss, 1 = <5 kg, 2 = 5–10 kg, 3 = >10 kg). The total score ranges from 0 to 12 points. Clinical remission was defined as an Eckardt score of ≤3.

Safety of the procedure was assessed by reported complications including the development of gastroesophageal reflux disease (GERD) measured by postoperative 24-h pH studies.

Study selection for analysis

To avoid duplicate data, we include the latest paper and/or the one with the biggest sample size from the same author or center when there was juxtaposition of author names or centers. Seven of the ten investigators who were contacted to clarify study eligibility responded. We developed a data abstraction form considering relevant outcomes to answer the research question. The form was pilot-tested by one of the authors and accepted with modifications by all authors in consensus.

Extracted data included study characteristics, demographics (BMI, age, and gender), previous treatments, details of the surgical intervention, primary outcome results, GERD assessment, and length of follow-up.

Results

General description

Our search strategy identified 2894 citations. After duplicates were removed, two reviewers assessed 2112 citations, of which 54 studies were eligible for full-text evaluation (see Fig. 1, PRISMA flow diagram). No randomized control trials were found. All 54 studies were considered in our qualitative analysis; nevertheless, some of the studies were excluded due to data repetition and other important characteristics we present below. A total of 19 of 54 studies were included in the final qualitative analysis [15, 1870].

Fig. 1
figure 1

PRISMA diagram showing study selection

Among 54 studies, five compared efficacy of POEM versus LHM [22, 27, 31, 33, 54] as shown in Table 1. Another study compared the extension of the myotomy to normalize EGJ distensibility between POEM and LHM [46]. Other comparisons included POEM in patients with and without previous endoscopic treatment [20, 29, 48], longitudinal versus modified mucosal incision [60, 69], water jet versus conventional dissection [23]. There were no comparisons with PD.

Table 1 Studies comparing POEM versus LHM

Exclusion of studies

Twenty-eight studies reported repetitive data from the same center/cohort of patients [14, 17, 20, 2225, 2743, 45, 46, 49, 54], five reported outcomes with less than 3 months of follow-up [18, 19, 21, 26, 50], and two did not report outcomes using Eckardt score or LES pressure [44, 48]. All were excluded for qualitative synthesis.

Qualitative analysis of the included studies

Nineteen studies were included in the qualitative analysis. Success rate of completing the procedure and the effectiveness in achieving a postoperative Eckardt score of lower than 3 were high. Of 1310 POEM procedures attempted, 1299 were completed (99 %) of which post-procedural Eckardt score was available in 1228 patients. The procedure was considered effective in 1171 patients (Eckardt score < 3 in all studies except one which used Eckardt score < 4) [65]. Post-procedure LES resting pressure was reported in 836 cases (Table 2). Age and sex were reported in 17 of 19 studies. Only 3 studies reported BMI [47, 52, 64]. Ten retrospective and nine prospective studies were identified. No randomized control trial was found. Seventeen studies reported the presence of previous treatment, but the type of intervention or number of pre-POEM treatments was not uniformly described. Thus, we grouped any previous endoscopic treatment (n = 289) and previous Heller myotomy (n = 36). Nine studies reported the presence of sigmoid esophagus in 99 patients (Table 2).

Table 2 General characteristics of 19 selected studies

Technical aspects of POEM

Myotomy length was not uniformly reported. Seventeen of 19 studies reported the total length of myotomy, and only seven studies reported the length of gastric extension. The total length of myotomy was variable from 5.4 to 26 cm (Table 3). The most frequent myotomy technique was reported as a “partial” or “circular,” referring to the dissection involving only the circular muscle layer, preserving the longitudinal muscle layer. There was a paucity of data reporting the number of “full-thickness” myotomies either intentionally or inadvertently as part of the procedure (Table 3).

Table 3 Technical heterogeneity in the 19 selected studies

Main outcomes and heterogeneity among studies

All 19 studies reported pre- and post-POEM Eckardt score and/or LES pressure. However, 5 studies did not provide any measure of variability (e.g., SD, IQR, or range) for the preoperative Eckardt score and hence were not included in the statistical analysis. A very high statistical heterogeneity was observed in both the pre- and postoperative Eckardt scores. Fourteen studies reported the preoperative Eckardt score (I 2 = 89.45 %), and 13 studies reported the postoperative Eckardt score (I 2 = 97.28 %) (Fig. 2). We also investigated the distribution of mean change in POEM Eckardt score between pre- and postoperation across the studies for which data are available. The results show that an improvement was observed across all the studies; however, there is high level of heterogeneity (I 2 = 90.24 %) in mean change from preoperation. Heterogeneity in mean change was still high when prospective and retrospective studies are considered separately, where I 2 = 80.95 % for prospective studies and I 2 = 94.71 % for retrospective studies. Given the high level of heterogeneity, pooling of the estimates is not recommended, and hence pooled estimates are not provided. Study-level mean change in Eckardt score between pre- and postoperation with the corresponding 95 % CI is shown in a forest plot (Fig. 3).

Fig. 2
figure 2

Forest plot showing pre- and postoperative Eckardt score data in 14 and 13 studies, respectively

Fig. 3
figure 3

Forest plot showing pre- and postoperative LES resting pressure data in 11 and 10 studies, respectively

Preoperative LES pressure profiles were reported in 11 studies. Of those, 10 reported postoperative data for comparative analysis. Similar to the Eckardt score, a high level of heterogeneity (Fig. 3) was observed in both pre- and postoperative LES pressures (I 2 = 98.47 % and I 2 = 99.41 %, respectively). Overall, results from examining the distribution of mean change in LES pressure between pre- and postoperation show that there is a significant improvement in LES pressure. However, a high level of heterogeneity in mean change from preoperation was observed across the studies (I 2 = 97.10 %). Heterogeneity was also still high when prospective and retrospective studies are considered separately in a subgroup analysis, where I 2 = 96.34 % and I 2 = 93.33 % are observed for prospective and retrospective studies, respectively. Similar to Eckardt score, pooling is not recommended due to high level of heterogeneity. Study-level mean change in LES pressure with the corresponding 95 % CI is provided in forest plots (Fig. 4).

Fig. 4
figure 4

Forest plot showing mean Eckardt score (pre–post) change data in 13 studies and mean LES resting pressure (pre–post) change in 10 studies

Complications

The most frequently reported complications (Table 4) were mucosal perforation (n = 118), pneumothorax (n = 69), pneumoperitoneum (n = 221), pneumomediastinum (n = 58), subcutaneous emphysema (n = 131), pleural effusion (n = 132), and pneumonia (n = 103). Two studies reported no complications [55, 59]. Another two studies used CT to assess complications [56, 62].

Table 4 Reported complications in 19 selected studies

All complications related to the procedure were managed conservatively using endoscopic clipping, suturing, or hemostasis interventions. Veress needle decompression was commonly used to treat pneumoperitoneum. The majority of pleural effusions and pneumothoraces resolved spontaneously. Only 7 pleural effusions and 3 pneumothoraces required thoracic drainage. One contained perforation of the EGJ required endoscopic drainage and laparoscopy failed to demonstrate the esophageal defect.

There was no mortality reported. None of the POEM procedures had to be converted to surgery. Three patients with persistent symptoms after POEM underwent LHM (n = 2) and pneumatic dilation (n = 1).

Only 4 studies evaluating 147 patients assessed GERD after POEM, using 24-h pH monitoring. The range of abnormal esophageal acid exposure varied from 20 to 53 %; all patients were symptomatically controlled with proton pump inhibitors (Table 4).

Discussion

Laparoscopic Heller myotomy is considered the standard of care for patients with achalasia. However, POEM appears to be a promising option in replacing LHM. Our systematic review identified 19 studies assessing the efficacy and safety of POEM including 10 retrospective and 9 prospective studies. There was no randomized control trial. Our analysis of the data revealed important findings: (1) POEM is effective in terms of reducing Eckardt scores and resting LES pressure in patients with achalasia. (2) High heterogeneity of data in reporting outcomes prevents a meta-analysis. (3) POEM appears to be safe in terms of low complication rates. (4) There is a lack of objective esophageal acid exposure assessment using 24-h pH monitoring.

POEM effectiveness and heterogeneity of data

The differences between preoperative and postoperative Eckardt score and LES pressure are highly significant for all the studies considered in our analysis. Two systematic reviews have analyzed the effectiveness of POEM in terms of subjective (Eckardt scores) and objective (LES resting pressure) outcomes through meta-analysis or “pooling” results [71, 72]. However, our study highlights the high level of statistical heterogeneity (I 2 = 90.24 % and I 2 = 97.10 % for Eckardt score and LES pressure, respectively), which technically prevented us from performing a meta-analysis.

Pooling of results comparing LHM versus POEM was also not possible due to a lack in reporting data (Table 3). Only one study [23] reported pre- and post-Eckardt scores, pre- and post-LES pressure, and postoperative objective pH measurement, comparing 37 POEM versus 64 LHM showing that patient symptoms and esophageal physiology were equally improved.

Safety of POEM

POEM is considered a very safe procedure due to the very low rate of serious complications. The rate of reported complications however varies significantly from 0 % to more than 30 % among reviews [7173]. This variability is likely related to the variability in reporting complications. Furthermore, this new endoscopic procedure which shares more features of a surgical procedure rather than a classical endoscopic procedure results in a significant challenge and lack of consensus in defining what constitutes a complication. Many of the reported complications including pneumoperitoneum, pneumomediastinum, full-thickness myotomy, and mucosal perforation are common occurrences which are easily managed intraoperatively and are often of little to no clinical significance. In this setting, we found two factors that may play a role in variability of reporting complications. First, the utilization of CO2 instead of ambient air reduces the postoperative rate of complications since CO2 is more absorbable than ambient air. It is well known that during endoscopic submucosal dissection, longitudinal muscles can be exposed or split provoking air passage into the mediastinum. The use of a more absorbable gas for insufflation during the procedure may reduce the volume of the leakage. Cai et al. reported on their initial experience with room air in 157 POEM patients compared to their late series of 143 POEM patients for whom CO2 was available. The use of ambient air was associated with an increased rate of pneumothorax (p < 0.001) [45].

Secondly, routine CT scan can detect small amounts of CO2 post-POEM which may detect changes post-procedure, rather than clinical complications. Yang et al. evaluated 108 patients within 30 h after POEM (all with CO2 insufflation). Pneumoperitoneum and/or pneumomediastinum was detected in 53.7 % of patients. There was no statistically significant relationship between the presence of pneumoperitoneum and or pneumomediastinum detected on CT and the development of complications such as minor inflammation of lungs, pleural effusion, subcutaneous emphysema, segmental atelectasis of lungs, ascites or severe complications, including delayed hemorrhage, esophageal perforation, retroperitoneal abscess [74].

GERD after POEM

The incidence and severity of postoperative GERD following POEM remains an important issue. The results in this systematic review vary among the studies depending on the method utilized to evaluate reflux. Comparative pre- and post-POEM endoscopy showed an increased incidence of esophagitis from 0 to 19 % [38, 40, 60]. Although endoscopy provides excellent evidence for the diagnosis of GERD when esophagitis is present, this condition has been found in only 30 % of patients off acid suppression treatment and in an even smaller proportion when treated with PPI [75]. As conventional histology from random biopsies has poor performance for the diagnosis of GERD, ambulatory 24-h pH monitoring is the gold standard in the diagnosis of GERD [76].

Previous experience with Heller myotomy has shown that the postoperative incidence of reflux, measured with 24-h pH monitoring at 6 months was significantly higher without fundoplication (43 %) compared to the group with partial fundoplication (9 %) [77]. Gastroesophageal reflux after Heller myotomy with partial fundoplication has been found in about 25–35 % of patients and is usually well controlled with medical therapy [78]. Thus, partial fundoplication (Dor or Toupet) is commonly added to Heller myotomy to prevent high incidence of postoperative reflux. A multicenter, prospective, randomized-controlled trial showed no significant difference in the acid exposure of the distal esophagus among 60 patients who underwent Dor (36) or Toupet (24) fundoplications after Heller myotomy at 6–12 months. Abnormal acid reflux was present in 10 of 24 patients in the Dor group (41.7 %) and in 4 of 19 patients in the Toupet (21.0 %) (p < 0.152) [79]. Only a few studies in our review reported postoperative 24-h pH monitoring [52, 59, 65, 66]. The range of abnormal 24-h pH monitoring reported in these studies was 20–53 %, which is comparable to previous data on Heller myotomy alone. However, additional studies are needed to further address the question of postoperative reflux in patients undergoing POEM.

Limitations

Our study has several limitations. We aimed to include studies with more than 10 cases and to address pre- and post-op Eckardt scores and/or LES pressure, but that strategy excluded the few studies comparing LHM and POEM. In addition, the length of myotomy is not uniformly performed and reported. In our analysis, only 7 studies differentiated esophageal from gastric length, showing a large variation in gastric (1.1–7 cm), esophageal (3–13 cm), and total (2.6–26 cm) length (Table 3), making the data difficult to compare. Lastly, sigmoid esophagus and the presence of previous treatments may make POEM more challenging and data less comparable. In our analysis, fourteen studies reported the presence of sigmoid esophagus involving 99 patients treated with POEM.

Future research

The role of POEM in the treatment of achalasia should be defined comparing it with the standard of care. A meta-analysis of RCTs compared different treatment options for achalasia showing that pneumatic dilation has better remission rate and lower relapse rate than botulinum toxin injection (BTI) [9] and at least four RCTs including around 450 patients have compared pneumatic dilation versus LHM, showing better long-term outcomes for latest [12, 8082]. Other meta-analyses have confirmed these results positioning LHM as the standard of care for the treatment of achalasia [8, 10, 11]. Thus, we consider that future research should be directed to compare POEM to LHM in a randomized fashion. Estimation of the number of patients and the appropriate power calculation should be made for future trials. Moreover, the design should consider long-term follow-up (5–10 years) as established with studies comparing PD versus LHM.

Based on our systematic review, we are unable to clearly identify the group of patients where POEM represents a better option than LHM. The main reasons are heterogeneity of the data and the lack of comparisons with LHM. Perhaps, in cases of failed surgical treatment for achalasia, POEM has the advantage of preventing mucosal perforations since it can be performed endoluminally in the posterior esophageal wall, avoiding the scar tissue of previous surgery, but more studies are needed to confirm this hypothesis.

Conclusion

POEM appears to be a promising, effective, and safe option for the treatment of achalasia. However, the high heterogeneity and lack of RCTs make current published data difficult to compare.

RCTs and long-term follow-up studies comparing POEM versus the standard of care (LHM) are needed to further establish the efficacy of POEM in the management of patients with achalasia. Esophageal physiology and symptoms improvement (particularly dysphagia) should be considered endpoints for future comparisons with LHM.

Research should aim to better define complications and standard postoperative changes related to the POEM procedure. In addition, the incidence of GERD following POEM should be studied with objective 24-h pH testing.