Achalasia is a rare idiopathic disease of esophageal motility characterized by dysphagia, chest pain, weight loss, and regurgitation. While the etiology of achalasia is not well established, the disease is characterized by degeneration of ganglia in the lower esophageal myenteric plexus leading to failure of lower esophageal sphincter relaxation [1,2,3,4]. This process is associated with a variety of esophageal motor patterns, ranging from absent contractility to preserved peristalsis [1,2,3,4]. While there is no cure for achalasia, therapy is relegated towards symptomatic relief via disruption of the lower esophageal sphincter [4], with the current gold standard therapy being a surgical (Heller) myotomy with partial fundoplication [2,3,4,5,6,7,8,9,10]. However, breakthroughs in advanced endoscopic tunneling techniques have led to the development of the per-oral endoscopic myotomy (POEM) as a possible alternative to the gold standard Heller myotomy.

Since the initial report in 2008, over 20 published studies with estimates of over three thousand patients worldwide have undergone successful POEM procedure for the treatment of achalasia [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. Despite this increased application of POEM, the role of POEM vs. Heller myotomy (HM) for the treatment of achalasia remains unclear. Meta-analyses of POEM and Heller have demonstrated similar clinical success rates for both procedures; however, a growing number of reports suggest that POEM may be superior to Heller myotomy within certain achalasia phenotypes [1,2,3,4,5, 4550]. Yet only a minority of studies have directly compared surgical myotomy to endoscopic myotomy [37,38,39,40,41, 48, 51, 52] with limited analysis of long-term clinical outcomes.

As our institution was an early adopter of POEM and also has a long history of Heller myotomy for the treatment of achalasia, we sought to compare the long-term clinical efficacy of per-oral endoscopic myotomy to surgical myotomy in regard to active symptomology, freedom from additional therapy, and reflux symptomology.

Methods

A retrospective review of all patients that underwent per-oral endoscopic and Heller myotomy for the treatment of achalasia from 2012–2015 at our institution was performed after approval from our Institutional Review Board. Baseline clinical, demographic, radiographic, and manometric data were collected. When available all index manometries were reviewed and reclassified in concordance with current Chicago Classification v3 [2]. Clinical failure was defined as (a) Eckardt Score > 3 for at least 4 weeks, (b) hospitalization secondary to achalasia-related complications, or (c) repeat intervention to the lower esophageal sphincter. All patients with clinical failure regardless of the time interval were included in the analysis. Patients without noted failure and with < 2.5 years of clinical follow-up information were deemed lost to follow-up and not included in the analysis. A member of the research team (AP) attempted to contact all patients. If the patient did not return the call or was unable to be reached after three calls, the patient was no longer contacted.

A pretreatment barium esophagram was reviewed when available for morphology and maximum diameter. Supplementary Fig. 1 serves as a visual reference for the barium variants identified. “Bird’s Beak” imaging variant describes the classic description of achalasia with proximal dilatation with gradual tapering (a). The “J- Shaped” imaging variant was defined as any sharp angulation (approx. 90 degrees) in the distal esophagus (b), and “Corkscrew” defined as multiple short small positional turns in the esophagus (c). The “Sigmoid” esophageal variant was any esophagus with 2 distinct large angulations, and the “Tubular” esophageal variant was a normal esophagus on esophagram without significant gross dilatation or distal tapering.

Fig. 1
figure 1

A KM curve for POEM vs HM. B KM curve for Type I Achalasia. C KM curve for Type II Achalasia. D KM curve for Type III Achalasia

Procedural information including procedural complications, duration of hospital stay, length of esophageal, gastric, and total myotomy was collected. Procedural complication was defined as any adverse event in the post-operative setting. Mild vs Major adverse events were differentiated via the Clavien–Dindo Classification scale [53], (< 2 for minor, ≥ 2 for major). Post-operative esophagram/Upper GI series studies were screened for tertiary contractions and delay in esophageal contrast emptying.

Follow-up data including last evaluation date with current Eckhardt score, follow-up endoscopic data, presence of gastroesophageal reflux symptoms, need for reflux medications/therapies, pH testing, and need for any follow-up therapies for achalasia were obtained.

Statistical methods

Descriptive statistics are reported as mean (standard deviation [SD]), or number and percentage as appropriate. Demographic, radiographic, manometric, and procedural parameters were compared between POEM and Heller patients using t tests for continuous variables, and Chi-square or Fisher’s exact tests for categorical variables. A univariate and multivariate analyses was constructed via a Cox proportional hazards model regression and results were reported as hazard ratios (HR) with 95% confidence interval. Variables with p value < 0.2 were included in the multivariate model. Time to failure was also assessed using the Kaplan–Meier method and clinical significance between survival curves was assessed via the log-rank (Mantel–Cox) test. Patients were censored at their last clinical follow-up date or date of clinical failure. All statistical tests were two sided, and p values ≤ 0.05 were considered statistically significant. All statistical analyses were conducted using the STATA statistical software package version 15.1 (College Station, Texas).

Results

A total of 69 POEM patients and 61 Heller patients were identified.

In the POEM cohort, 10 were lost to follow-up, and 3 died prior to 2.5 years of follow-up data (1 secondary to complications from a pancreatic neuroendocrine tumor, one secondary to a mechanical fall and aspiration pneumonia during that hospitalization, and one patient secondary to complications related to metastatic prostate cancer and congestive heart failure). Eckhardt score was noted to be < 3 in all three patients at the time of death. In addition, one patient’s response to therapy was equivocal. Eckhardt score remained persistently low (from 10 to 2) with EGD and Endoflip detailing widely patent LES with low distensibility. However, an EGD with botulinum toxin injection was performed empirically for diagnostic purposes (in an attempt to alleviate chronic sore throat and eructation) without symptom improvement. Secondary to this uncertainty, this single patient was not included in the final analysis.

In the Heller myotomy cohort, a total of 15 patients were lost to follow-up with an additional 3 patients deceased prior to a cumulative 2.5-year follow-up period (1 from complications of pneumonia and subsequent respiratory failure, 1 from complications related to congestive heart failure, and one secondary to acute hypercarbic respiratory failure due to chronic rejection of lung transplant). Again, Eckhardt score was noted to be < 3 in all three patients at the time of death. A total of 8 POEM patients and 4 Heller patients were not followed longitudinally at our institution and completed a structured phone interview. Therefore, a total of 55 patients with POEM and 43 patients with Heller for a total of 98 patients were included in the analysis and are depicted in Supplementary Fig. 2.

Baseline characteristics

Initial demographic, clinical, radiographic, and manometric data are reported in Table 1. Average age, race, and sex were similar between the two cohorts.

Table 1 Baseline characteristics

Mean length of symptoms was noted to be 5.50 + / − 1.2 years and 6.12 + / − 1.5 years for POEM and Heller, respectively. Pre-surgical Eckhardt scores were slightly but significantly higher in the POEM cohort compared to Heller (8.73 vs 7.57, p = 0.008). In the group that underwent POEM, achalasia subtype was noted to be Type I in 13 patients (23.6%), Type II in 23 patients (41.8%), Type III in 15 patients (27.3%). Two patients’ manometry was unable to be reviewed and one patient demonstrated esophagogastric junction outflow obstruction (EGJOO) with diffuse esophageal spasm (DES) and one demonstrated EGJOO alone. In the Heller cohort, achalasia subtype I was noted in 12 patients (27.9%), type II in 14 patients (32.6%), and type III in 9 patients (20.9%). One patient had esophagogastric outflow obstruction (EGJOO) alone with one patient possessing EGJOO in combination with Jackhammer esophagus. Six patients had manometry that could not be re-reviewed for classification. There was no statistical difference between achalasia subtypes in the two treatment cohorts.

Mean IRP was noted to be 23.1 + / − 2.3 and 26.0 + / − 2.8 ( p > 0.05) in the POEM and Heller group, respectively. Pre-operative fluoroscopic studies were available for review in 86 patients: 52 in the POEM cohort and 34 in the Heller cohort. Barium Esophagram noted similar maximum diameter at 35.2 vs. 38.8 mm, p = 0.2683. Barium morphology in the POEM cohort was noted to be Bird’s Beak in 34 (65.4%), J-Shaped 10 (19.2%), and normal in 5 (9.6%). No patients had sigmoid esophagus in the POEM group. The Heller myotomy group possessed Bird’s Beak in 20 (58.8%), J-Shaped in 5 (14.7%), sigmoid in 1 (2.9%), and tubular/normal on 1 (2.9%). There was no statistical difference between the groups.

Prior treatment history between the two cohorts was also similar and reflective of clinical care at a tertiary referral center. Sixteen (29.0%) POEM patients and 18 (41.9%) Heller patients noted no previous treatments. Medications were used for prior therapy in 7 POEM and 7 Heller Patients (5 nifedipine, 1 oral diltiazem and oral nitrates, 1 oral sublingual nitroglycerin in the POEM group; 4 sublingual nitroglycerin, 3 oral diltiazem in the Heller group). Dilatations (nonpneumatic) were employed in 43 (78.2%) POEM and 28 (65.1%) Heller patients. Multiple or serial dilatations were performed in 1 (1.8%) of POEM patients and 4 (9.3%) Heller patients. Pneumatic dilatation was only performed in 1 POEM patient and in 2 Heller patients. Botox injection was performed in 19 (34.5%) of POEM patients compared to 8 (18.6%) of Heller patients, with approximately 7.0% of patients in both arms having > 2 injections prior to intervention. Previous Heller myotomy was performed in 6 (11.5%) of POEM patients and 2 (4.1%) of Heller patients. No statistical difference in treatments was noted among any of the above variables.

Procedural data

Procedural data are listed in Table 2. A total of 83.7% of Heller patients underwent some form of anti-reflux surgery (75% Toupet, 25% Dor), whereas 16.2% of patients underwent Heller myotomy alone without anti-reflux measures. In the POEM cohort, an anterior myotomy was performed in 90.2% of cases, with the remainder undergoing a posterior approach. The POEM group was associated with a longer esophageal and total myotomy (10.1 + / − 4.2 vs 6.9 + / − 1.5 and 12.2 + / − 4.2 vs 9.5 + / − 2.0, both p < 0.001). A longer gastric myotomy was seen in the Heller cohort (2.6 + / − 2.5 vs 2.0 + / − 0.0, p < 0.001).

Table 2 Procedural characteristics

Length of stay was noted to be lower in the POEM cohort (1.7 + / − 1.7 vs 2.6 + / − 2.5, p = 0.021).

A total of 7 minor peri-procedural-related complications were noted in the POEM cohort compared to 9 in the Heller group. The minor POEM complications ranged from a small vallecular tear related to tracheal intubation, capnomediastinum and capnoperitoneum (in two patients), capnoperitoneum, mucosectomy leak requiring conservative management, post-operative atrial fibrillation, and post-operative chest pain, conservatively managed. In the Heller cohort, minor peri-procedural complications included self-limited transaminitis (AST and ALT approximately 500) that resolved spontaneously post-operative chest pain and tachycardia requiring conservative management, wound infection requiring oral antibiotics, fungal surgical site infection requiring topical antifungal treatment, and post-operative fever (thought to be secondary to sinusitis).

In the POEM group, one major post-operative complication was noted (one patient re-presented 24 h after the operation complaining of significant epigastric and chest pain that required hospitalization for observation). In the Heller group, 3 major complications were seen (1 small intraoperative perforation requiring conservative management, 1 delayed perforation requiring exploratory laparotomy and repair, and 1 case of post-operative mediastinitis requiring ICU admission, IV antibiotics, and serial washout). No statistical difference was identified between minor or major complications between the groups.

Post-operative esophagram data were screened for the presence of delayed contrast and tertiary contractions. There was no significant difference between the group in the presence of delayed esophageal contrast emptying (38.6% POEM vs. 53.3% Heller, p = 0.320), although the POEM group possessed a higher presence of tertiary contractions immediately following the procedure (52.4% vs 9.1%, p = 0.023).

Follow-up data

Follow-up data are reported in Table 3. The mean length of follow-up was noted to be 3.94 years + / − 1.15 in the POEM cohort compared to 5.44 + / − 1.81 in the Heller cohort. Clinical failure was identified in 15 POEM patients (27.3%) and 15 Heller patients (34.9%) for success rates of 72.7% in POEM and 65.1% in the Heller cohort. In an analysis of achalasia subtype, therapeutic success was 9/13 (69.2%) vs 7/12 (58.3%) for type I Achalasia in the POEM and Heller groups, respectively. Success was noted in 20/23 (87.0%) POEM patients vs 10/14 (71.4%) in Type II Achalasia patients and success rates for Type III were noted to be 8/15 (53.3%) in POEM vs. 4/9 (44.4%) Heller. Eckhardt score at last follow-up was noted to be 0.8 + / − 1.0 and 0.7 + / − 1.1 for treatment successes in both POEM and HM, respectively (p > 0.05), while the last Eckhardt Score for those with treatment failure was higher in the POEM group (5.9 + / − 1.5) vs HM (3.13 + / − 2.16 (p < 0.05). The median time to failure in the treatment failure group for POEM was 2.78 and 1.36 years for Heller (p < 0.05).

Table 3 Long-term follow-up

In the POEM failure cohort, a combination of repeat therapeutic modalities was typically employed. However, 3 ultimately underwent a repeat POEM, and 3 underwent HM. No patients were treated with pneumatic dilatation following POEM. 1 was treated with repeat Botox alone. All remained clinically symptomatic despite repeat interventions with Eckhardt scores > 3, with the exception of one patient that improved following Botox injection. One patient POEM failure was lost to follow-up following repeat botulinum toxin injections × 2 and placement of a gastrostomy tube.

In the Heller failure cohort, again multiple treatment modalities were used, with frequent treatment overlap. One patient underwent POEM, with 4 receiving a second HM. 4 patients were treated with Botox alone and 1 received a pneumatic dilation. Following this second intervention persistently improved Eckhardt scores were only seen in two patients (one patient that underwent pneumatic dilatation, and in 1 patient that underwent repeat HM); however, repeat Eckhardt score was unable to be calculated in 6 patients as many were subsequently lost to follow-up.

Figure 1a demonstrates the KM curve time to failure analysis for POEM vs Heller myotomy with Fig. 1b–d demonstrating time to failure analysis of achalasia subtype (I–III) stratified by procedure. By univariate Cox analysis (Table 4), only Type III Achalasia was significantly associated with treatment failure (HR 2.262, 95% CI (1.0860 − 4.7108), p = 0.029). Age, pre-procedural Eckhardt score, duration of symptoms, and previous treatment with Botulinum toxin were noted to trend towards significance and were included in a multivariate model (p < 0.20). However, the final Cox model demonstrated no independently statistically significant markers for failure.

Table 4 Univariant & multivariant analysis

 Patient-reported GERD-related symptoms (heartburn, regurgitation, etc.) were reported in 44.9% of POEM patients and 46.5% of Heller patients. Diagnostic reflux changes were only identified in 1.8% POEM Patients (one peptic stricture requiring dilatation) and 4.7% of Heller patients (one LA grade C, and one short segment Barrett’s esophagus). Equivocal GERD pathology was seen in 2 POEM patients (LA grade A esophagitis) and in 5 Heller patients (two patients with LA grade A, two patients with LA grade B, and one patient’s esophagitis was unclassified). LA Grade C was present in one patient without an anti-reflux procedure, with the remainder of the findings were present in the setting of an anti-reflux fundoplication.

Discussion

To our knowledge, this is both the largest Western long-term study of POEM efficacy and the largest study to compare POEM versus HM at a single institution. Our results demonstrate that POEM and HM have similar long-term efficacy for the treatment of achalasia, with similar complications and rates of reflux. Success for type III achalasia was higher for POEM, which also resulted in a shorter length of stay when compared to Heller myotomy.

There are several strengths to our analysis. While this is a retrospective cohort study and not a randomized controlled trial, patients were similar across clinical, radiographic, and manometric grounds. One provider performed all POEM procedures (HR) and > 80% of HM procedures were performed by two providers (HR & JL), limiting variations in technical expertise and experience. Also, when available, all manometry tracings were reclassified into the current Chicago Classification v3 criteria to limit era-based confounding. The patients referred for treatment represent a typical phenotype seen at tertiary referral facilities, with a high rate of patients with both anatomical abnormalities and prior treatment attempts. Additionally, the similar efficacy of POEM and Heller in our results mirrors other direct comparisons of Heller and POEM. In a 6-month midterm follow-up study by Ramirez et al., repeat intervention was required in 28.5% of POEM patients and 22.8% of Heller patients [51]. Hanna et al. noted treatment failure in 41% of LHM vs 26% of all POEM patients (median follow-up time 37 and 22 months respectively), and Peng et al. noted treatment failure at 3 years in 25% of POEM patients and 20% of Heller Patients [48, 52].

Safety of both procedures appears excellent and is in line with adverse events reported in prior studies [3]. While reflux rates in both groups were similar and not insignificant, pathologic esophagitis was rarely identified in both the POEM and Heller cohorts. These findings are not entirely unsurprising as pathologic GERD has demonstrated poor patient-reported reliability and a recent meta-analysis of GERD outcomes in Heller and POEM demonstrated similar rates of clinically apparent disease [54].

Despite these strengths there are several limitations to our findings. While treatment responses were comparable in the two cohorts, the long-term efficacy is lower than what is typically quoted. This is likely multifactorial related to the tertiary patient population, anatomic abnormalities, prior treatment attempts before this intervention, long-term nature of the study, and study design. For example, the high efficacy reported in previous randomized control trials [6,7,8] comparing the efficacy of HM to other achalasia therapies had strict inclusion criteria, excluding patients with previous pyloric directed therapies such as botulinum or dilatation. These therapies, while not identified in our univariate or multivariate regression, have been reported to decrease the efficacy of subsequent intervention and were seen with high frequency in our cohorts. Other authors have also noted significant reductions in efficacy of both POEM and Heller in those patients with longer duration of symptoms, higher baseline pretreatment Eckhardt score, younger age of onset, and a lower pre-operative LESP (< 38 mm Hg), again factors seen in high frequency across both of our cohorts [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41, 45,46,47, 49,50,51, 55,56,57,58,59]. Data also suggest that anatomic variants may predispose to worse outcomes, which is not surprising given that this often links to duration of disease course [50]. Interestingly, there does exist some heterogeneity in the published success rates of HM for the treatment of achalasia and our results do match other previously reported outcomes when controlling for achalasia subtype (higher success in Type II compared to either Type I or Type III) [1,2,3,4]. As our report represents our institutions initial experience with the POEM procedure, a learning curve may also have been a factor—although subgroup analysis (not reported) does not point to a clear transition with our earliest patients. Conflicting thresholds for proficiency have been proposed with various authors noting improved clinical and technical failure rates ranging from 20–100 POEM cases [56, 60,61,62].

Additionally, as this was a retrospective cohort study, the decision for Heller myotomy vs POEM was not randomized. This leads to the possibility of a selection bias in which sicker patients were funneled towards the more traditional therapy of Heller myotomy. However, our demographic and clinical data were similar between both groups, arguing against a significant difference in disease severity. Also because of the retrospective nature of the study, the determination of when to perform reintervention was not controlled. As our study defined failure as any GEJ sphincter-directed therapy, uncontrolled use of these modalities as a diagnostic tools (using response to LES-mediated therapies as proof of LES dysfunction) may predispose our results to lower response rates. Technical aspects of HM vs POEM may also account for some response differences, as dysphagia secondary related to symptomatic wrap abnormalities (overly tight or slipped wrap) could explain increased failure rates seen in our HM cohort. Additionally, identification of incomplete myotomy may be easier in the POEM cohort compared to HM, as incomplete myotomy is associated with higher rates of symptom failure. While we did utilize intraoperative endoscopy to determine myotomy completeness in our HM cohort, intraoperative manometry (IOM) has been shown to significantly decrease the risk of incomplete myotomy [63].

There was also a small, but not insignificant number of patients that were lost to follow-up. In the patients that were initially lost to follow-up but were able to be reached via telephone, all but one reported minimal ongoing symptomology (all Eckhardt score < 3) suggesting a possible selection bias favoring that patients with ongoing symptoms were more likely to seek out care/reintervention.

While this is a relative large population with long-term follow-up, detailed analysis of subgrouping may be limited secondary to small sample sizes.

Our results add to the growing body of literature that POEM has a similar efficacy, safety profile, and rates of gastroesophageal reflux as compared to Heller myotomy for the treatment of achalasia. However, our results do highlight the need for a long-term randomized prospective comparison of the two techniques. Secondary to the rarity of achalasia and excellent response rates for both interventions, this likely will require a multi-center approach.