Achalasia (AC) is a rare motility disorder [1] with four main clinical manifestations including dysphagia, regurgitation, chest pain, and weight loss [2]. The diagnosis of AC requires the assistance of esophagogastroduodenoscopy (EGD) [3], high-resolution esophageal manometry (HRM) [4], and timed barium esophagogram (TBE) [5]. With the limitations of conventional therapeutic methods [6], Inoue [7] introduced peroral endoscopic myotomy (POEM). POEM was rapidly popularized due to short operation time, small trauma, fast recovery time, and no scar on the body surface.

While the short-term safety and effectiveness of POEM have been extensively studied [8], there is limited data on long-term outcomes beyond 2 years. In addition, our follow-up investigated that several patients under the traditional recurrence criteria based solely on subjective tests [9] had negative objective examinations and did not require more aggressive treatment strategies such as a second POEM. Therefore, these patients cannot be considered as "true recurrence." This highlights the necessity for a revised standard to accurately define recurrence. Furthermore, associated factors with long-term efficacy are still controversial and the ability to predict the long-term efficacy of POEM preoperatively remains a debate [9,10,11].

Another challenge is the higher incidence of long-term reflux after POEM [12]. Variations in defining clinical reflux and inconsistencies between subjective and objective reflux assessments contribute to conflicting findings across studies [13,14,15]. Additionally, long-term risk factors for reflux after POEM remain contentious [12].

In this study, we aimed to evaluate the safety and effectiveness of POEM and introduce two novel definitions for analyzing long-term outcomes. We also reviewed the long-term outcomes and associated factors reported by other major POEM centers. Moreover, we explored the feasibility of preoperative prediction. Additionally, we investigated associated factors related to clinical reflux, including symptomatic gastroesophageal reflux disease (GERD) and reflux esophagitis (RE), and summarized other long-term findings of reflux in large cohorts.

Methods

Patients

From November 2012 to March 2021, a cohort of 321 patients diagnosed with AC and treated with POEM was prospectively recruited at the First Affiliated Hospital of Nanjing Medical University. Inclusion criteria encompassed patients diagnosed with AC by symptoms such as dysphagia, chest pain and regurgitation and preoperative tests including EGD, HRM, and TBE [1]. These patients underwent POEM therapy, with a minimum follow-up duration of 2 years and complete follow-up data availability. Exclusion criteria comprised patients who lacked routine follow-up or had less than 2 years of follow-up. The study obtained pertinent clinical information, preoperative evaluations, procedural details, postoperative management, and comprehensive follow-up data. The study adhered to the Declaration of Helsinki (Registration No.: 2023-S2-189).

POEM procedure, perioperative management, and postoperative follow-up

POEM encompassed four procedural steps: mucosal entry, submucosal tunnel, myotomy, and entry closure. Postoperatively, patients received a management regimen including 24-h abrosia (Patients’ diets progress gradually from a clear liquid diet to a semi-liquid diet and to a normal diet in 1 month), acid suppression and gastric mucosa protection (A PPI is administered intravenously for 3 days and an oral PPI is taken for 4 weeks after discharge), nutritional support (Parenteral nutrition is provided when fasting and appropriate fluid replacement is provided after abrosia), and anti-infection measures (An intravenous infusion of antibiotics is delivered for 3 days, and then patients are transitioned to an additional 4 days of oral antibiotics). Patients were prospectively followed up at defined intervals, including the 1st month, 3rd month, 6th month, 1 year, and annually thereafter. Follow-up assessments consisted of Eckardt score [16], GerdQ score [17], TBE, EGD, and HRM.

Definition of clinical failure and reflux

Eckardt score was the assessment of AC symptoms, comprising the sum of the 3-point for dysphagia, regurgitation, weight loss, and chest pain. Traditionally, clinical failure was defined as an Eckardt score ≥ 4. However, our investigation revealed that a subset of patients with an Eckardt score ≥ 4 exhibited no abnormal findings on objective examinations. To address this, we established two criteria for defining clinical failure. The modified criterion required either: (1) retreatment (including pneumatic dilation, POEM, and/or surgical myotomy), or (2) an Eckardt score ≥ 4 accompanied by objective examination findings (such as endoscopy, TBE, and HRM) suggesting recurrence. Endoscopy is to rule out pseudoachalasia caused by cancer and other esophageal diseases [18]. For TBE, Barium height of > 5 cm at 1 min and > 2 cm at 5 min is suggestive of recurrence [19]. For HRM, IRP ≥ 15 mmHg indicates recurrence [20]. When a patient with Eckardt score ≥ 4 and pseudoachalasia is excluded by EGD, recurrence is diagnosed if either HRM or TBE meets the objective criteria. On the other hand, the normal criterion was defined as: (1) retreatment, or (2) an Eckardt score ≥ 4 without considering objective examination results. Clinical reflux consisted of symptomatic reflux and reflux esophagitis [12]. Symptomatic reflux was defined as GerdQ ≥ 8 [17]. Additionally, reflux esophagitis was diagnosed through endoscopic examinations by the Los Angeles classification [21].

Statistical analysis

Data analysis was performed utilizing SPSS 27.0 software. Categorical variables, expressed as counts (percentages), were analyzed by the chi-squared test or Fisher’s exact test. Continuous variables, expressed as mean (standard deviation) or median (interquartile range [IQR]), were analyzed by the t-test (normally distributed data) or Wilcoxon's rank sum test (skewed data). P value < 0.05 was determined as statistically significant.

Results

Patient and procedure characteristics

A total of 321 AC patients were included in the study. The median follow-up was 52 (32,73) months. The median age of patients was 43.79 years old, and among them, 144 (44.86%) were male. Fifty patients (15.58%) had undergone prior treatments, including pneumatic/bougie dilation in 13 patients, esophageal stent placement in 18 patients, Laparoscopic Heller Myotomy (LHM) in 1 patient, previous POEM in 1 patient, and multiple treatments in 10 patients. The median duration of disease was four years, while the preoperative Eckardt score was six. Sigmoid-type esophagus detected by TBE were found in 18 patients (5.61%). Lower esophageal sphincter pressure (LESP) and integrated relaxation pressure (IRP) detected by HRM were 34.80 ± 15.60 and 27.93 ± 11.91 mmHg, respectively. Achalasia subtypes [22] were type 1 in 36 patients (11.21%), type 2 in 224 patients (69.78%), type 3 in 3 patients (0.93%), esophagogastric junction outlet obstruction (EGJOO) in 7 patients (2.18%), and unknown in 51 patients (15.89%). The median operation time was 59.84 ± 22.57 min. The median length of myotomy was 8.20 ± 1.94 and 2.09 ± 1.13 centimeters in the esophagus and stomach side in turn. No aborted POEM happened in all cases. Adverse events occurred in 15 patients (4.67%), including 2 patients with esophageal fistula, 2 patients with esophageal mediastinal fistula, 4 patients with pulmonary infection, 3 patients with pleural effusion, and 4 patients with multiple adverse events. The average hospitalization duration was 8.22 ± 3.43 days (Table 1).

Table 1 Characteristics of patient population and perioperative metrics

Analysis of POEM failures

According to the modified criterion, twenty-three failures happened (7.17%), while under the normal criterion, forty-seven failures occurred (14.64%). Fifteen patients received retreatment after their original POEM, including 14 second-POEM and 1 PD. Table 2 indicated a statistically significant association between clinical failure and hospitalization (P = 0.027), with a weak correlation observed between efficacy and dysphagia score (P = 0.089) based on the modified criterion. However, other factors such as gender (P = 0.464), age (P = 0.510), body mass index (BMI) (P = 0.430), previous treatment (P = 0.961), disease duration (P = 0.289), preoperative Eckardt score (P = 0.873), sigmoid-type esophagus (P = 0.843), LESP (P = 0.422), IRP (P = 0.603), and adverse events (P = 0.144) did not exhibit a significant relationship with clinical failure. Regarding the normal criterion, a significant correlation was found between the esophageal myotomy length (P = 0.039) and long-term efficacy. However, there was little correlation observed between clinical failure and duration of disease (P = 0.096), weight loss score (P = 0.093), or length of myotomy on the gastric side (P = 0.086). Similarly, gender (P = 0.543), age (P = 0.682), BMI (P = 0.403), previous treatment (P = 0.321), preoperative Eckardt score (P = 0.621), sigmoid-type esophagus (P = 0.926), LESP (P = 0.543), IRP (P = 0.735), adverse events (P = 0.820), and hospitalization (P = 0.275) was not significantly related to long-term efficacy (Table 3).

Table 2 Analysis on the associated factors of postoperative efficacy (modified criterion)
Table 3 Analysis on the associated factors of postoperative efficacy (normal criterion)

Analysis of symptomatic reflux

All 321 patients underwent at least one evaluation for reflux symptoms, with 52 cases (16.20%) identified as having symptomatic reflux (GerdQ ≥ 8). Subsequently, we performed an analysis to determine factors associated with GERD. Our findings revealed that scores for chest pain (P = 0.070) and esophageal myotomy length (P = 0.090) exhibited little correlation with symptomatic reflux. However, no significant associations were found between GERD and other factors, including gender (P = 0.155), age (P = 0.848), BMI (P = 0.317), previous treatment (P = 0.967), preoperative Eckardt score (P = 0.639), sigmoid-type esophagus (P = 0.351), LESP (P = 0.947), IRP (P = 0.726), stomach myotomy length (P = 0.971), and adverse events (P = 0.960) (Table 4).

Table 4 Analysis on the associated factors of GERD

Analysis of reflux esophagitis

Due to the impact of the COVID-19 pandemic and patient preferences, only 88 patients underwent EGD during the follow-up periods. Consistent with other research, twenty-two reflux esophagitis (25%) (7 grade A, 13 grade B and 2 grade C RE by the Los Angeles classification) were detected, while no instances of reflux-related complications, such as Barrett's esophagus, were observed. Seven ‘normal criterion’ clinical failure patients had RE, while one ‘modified criterion’ clinical failure patient had RE detected by EGD. In addition, among 52 patients with GERD symptoms, 22 completed endoscopy, including 8 cases of RE and 14 cases of non-RE. Gender (P = 0.010), LESP (P = 0.013), IRP (P = 0.015), and esophagus myotomy length (P = 0.032) were independently associated with a higher likelihood of RE. Stomach myotomy length (P = 0.064) was of little statistical significance with RE. Other factors, such as sigmoid-type esophagus (P = 1.000), preoperative Eckardt score (P = 0.934), operation time (P = 0.286), and adverse events (P = 0.570) did not demonstrate a statistically significant relationship with RE (Table 5).

Table 5 Analysis on the associated factors of RE

Discussion

Our institution achieved an overall clinical success rate of 92.83% (modified criterion) and 85.36% (normal criterion) at the median follow-up of 52 months. Consistent with other large long-term studies (Supplement Table 1), our findings demonstrated a similar success rate under the normal criterion. However, this result included 24 patients with “false recurrence” (Eckardt ≥ 4 points and negative objective tests). These cases might be attributed to the absence of esophageal body peristalsis, leading to elevated Eckardt scores for dysphagia [6]. Additionally, increased regurgitation and chest pain scores could be related to clinical reflux. Patient dissatisfaction with postoperative outcomes might also contribute to falsely high Eckardt scores during follow-up. Therefore, we have established a modified standard to differentiate “true recurrence” and explored the associated factors with long-term efficacy.

Some studies suggest that no specific risk factors are associated with long-term efficacy [23, 24], while others have identified various factors such as preoperative Eckardt score [11, 16], previous treatment [11, 25,26,27], AC Chicago type [11, 28,29,30], sigmoid esophagus [10, 27, 31, 32], disease duration [11], age [33, 34], intraprocedural mucosa injury [9], and myotomy length above esophagogastric junction (EGJ) ≤ 8 cm [35] as potential predictors of clinical failures. In our study, hospitalization exhibited a significant relationship with efficacy under the modified criterion, while esophageal myotomy length was statistically correlated with clinical failures under the normal standard.

To our knowledge, this is the first report demonstrating the influence of hospital stay on the long-term efficacy of POEM. According to the modified standard, the recurrence group had 3 patients (13.04%) with complications and hospital stays of 12, 15, and 30 days, while the non-recurrence group had 12 patients (4.03%) with complications and hospital stays ranging from 6 to 26 days. Previous studies have indicated a relationship between hospitalization duration and adverse events (mostly occurring in the early stage) [36], being more susceptible to clinical failure, possibly due to the learning curve [37]. In addition, endoscopists may have minor flaws in their operations, but these minor flaws were not recorded in the reports. Additionally, in the early stage, operators tend to exercise caution in discharging patients, leading to longer hospital stays [38]. There have also been studies evaluating the feasibility of same-day discharge after POEM, demonstrating that it can be safe for select patients [39]. In summary, our center's data suggested that longer hospital stays were associated with clinical failure under the modified criterion. Thus, for patients with extended hospital stays, close postoperative follow-up is warranted to detect potential recurrence early.

Longer esophageal myotomy length was identified as a potential factor in clinical failure, which may challenge conventional assumptions. Previous studies have suggested that incomplete esophageal incision could lead to an inadequate reduction in LES pressure and subsequent recurrence [35]. Notably, incision length ≤ 8 cm above the EGJ was a risk factor for POEM failures [40]. One possible explanation is that longer myotomies might result in extra damage, further exacerbating the risk of recurrence [41]. Additionally, longer myotomies may contribute to prolonged operative time and a higher likelihood of inflation-related adverse events [42]. Recent evidence suggested that short myotomies could achieve comparable clinical outcomes while being more cost-effective [41]. The average LES length in AC patients was reported to be around 3.6 cm [43], suggesting that a shorter myotomy length of approximately 6 cm could effectively reduce LES pressure. However, for type 3 AC patients, longer esophageal myotomy length was required based on the spastic segment length under HRM [44]. Therefore, the optimal length of the esophageal myotomy requires further validation through high-quality multicenter trials.

To date, only three prediction models have been established for preoperative assessment of POEM failure. Zhou et al. [9] proposed a model that included previous treatment, intraprocedural mucosal injury, and clinical reflux. However, this model had limited practical significance in predicting outcomes preoperatively, as it consisted of postoperative factors. Satoshi et al. [10] established a risk-scoring model that categorized risk groups preoperatively based on a new definition named poor responders. Although this model indirectly predicted patients more likely to require retreatment, it did not directly predict POEM failure. Inoue et al. [11] developed a risk-scoring system that showed promise, incorporating only pretreatment factors and demonstrating good calibration and precision. However, this system has limitations in terms of its discriminative capacity and applicability to short-and mid-term outcomes only. As a result, the long-term efficacy of POEM remains unpredictable at present.

Clinical reflux is a major concern following POEM, and our cohort revealed that symptomatic reflux occurred in 52 patients (16.20%), while RE was detected in 22 patients (25.0%). Consistent with previous literature [13], the incidence of symptomatic reflux and RE after POEM was 9–43 % and 13–68 %, respectively. Our results align with those reported in most Asian studies. Risk factors of short-term reflux after POEM were summarized including the absence of anti-flux procedures after the incision [45], full-thickness myotomy [46], a posterior approach [47], and a gastrectomy over 2.5 cm [48]. However, the understanding of long-term reflux factors remains limited. In addition, variations in GERD definitions across studies and the poor correlation between reflux symptoms and endoscopic findings have further complicated the identification of risk factors [14, 15]. Therefore, we explored risk factors for symptoms and esophagitis, respectively, and found that no specific risk factors for symptomatic reflux, whereas gender, preoperative LESP, preoperative IRP, and the length of esophageal myotomy were potential factors associated with RE. Moreover, reflux analyses of other centers were summarized in Supplement Tables 2 and 3.

In our study, we defined a GerdQ score of ≥ 8 as indicative of reflux symptoms [17]. However, symptoms may not solely indicate true reflux (volume reflux resulted by reduction of LES resting pressure after POEM). Non-reflux esophageal acidification due to stasis or acid fermentation, and esophageal hypersensitivity to chemical or mechanical stimuli may also contribute to symptomatology [49]. Additionally, esophagus hyposensitivity resulting from mucosal denervation during submucosal tunneling and myotomy would lead to symptomatic reflux in specific patients [14]. While no risk factors for long-term symptomatic reflux were identified in our cohort, the aforementioned considerations including female patients [15, 50] still guide our approach to enhance postoperative follow-up.

Our study revealed a correlation between male gender and RE. This aligns with previous studies indicating that men tend to exhibit higher baseline and maximum acid production [51], which can contribute to the development of RE. However, studies conducted by Ayazi [15] and Nabi [50] held opposite opinions. Further research is warranted to explore the impact of gender differences on multicenter cohorts, thus developing individualized treatments.

Low LESP was associated with esophagitis in both Ayazi [15] and our study. LESP plays a crucial role in maintaining the integrity of the reflux barrier. Some researchers [52] found that patients with esophagitis had significantly lower resting pressure compared to healthy volunteers. Furthermore, the probability of severe esophagitis increased incrementally with every 10 mmHg decrease in preoperative resting pressure. Our study also identified low preoperative IRP as a risk factor for RE. An abnormal increase in IRP is characteristic of patients with AC, and a decrease in IRP after POEM (mostly < 15 mmHg) is indicative of remission. Patients with low pre-IRP were more susceptible to developing esophagitis because of similar acid exposure without an evident decrease in IRP following POEM [46]. Confirming the association between LESP, IRP, and RE would benefit from further multicenter studies.

Our study indicated that esophageal myotomy length exhibited a significant relationship with RE. Current studies have demonstrated an association between longer esophageal myotomy length and increased postoperative abnormal acid exposure [48], which may contribute to an elevated risk of RE. The exact underlying mechanism for this relationship remains unclear. However, one possible explanation is that an excessively long myotomy could result in the disruption of circular and longitudinal muscles, compromising the anti-reflux function of the LES. Further research is needed to elucidate the internal mechanisms.

Our study provided valuable insights into the long-term efficacy and effectiveness of POEM in a large clinical center, with a follow-up period exceeding two years. However, there are still several limitations. First, we conducted a single-center, retrospective research with inevitable bias, and due to the patients mostly from East China, the external validity of our results to other populations may be limited. Second, several assessments after POEM were lacking because of patient intolerance for examinations like HRM.

In conclusion, our large cohort demonstrated favorable long-term outcomes of POEM. We introduced two novel definitions of clinical failure after POEM (with 7.17% of the modified criterion and 14.64% of the normal criterion). Shorter hospitalization duration and shorter length of esophageal myotomy length may decrease the incidence of long-term clinical failures after POEM according to the modified and normal criteria, respectively. Besides, we sorted out other risk factors related to POEM failure and emphasized the impracticality of preoperative prediction. Moreover, no factor was associated with post-POEM symptomatic reflux, whereas male patients with low preoperative LESP and IRP should conduct shorter myotomy in the esophagus to prevent reflux esophagitis.