Introduction

Dysphagia after anterior approaches to the cervical spine is a common occurrence. It is attributed to the retraction of the pharynx, trachea, and esophagus and soft tissues across midline needed to expose the operative segments. The clinical manifestations of dysphagia include difficulty swallowing, choking, or a “sticky throat” feeling. Reported incidences of dysphagia following anterior cervical surgery have ranged from 1 to 83% [1,2,3,4,5,6,7,8,9], with about 35% persisting to chronic dysphagia [8]. While most dysphagia is self-limiting and resolves within a few weeks post-operatively, severe swallowing disorders can considerably impact quality of life and might even lead to death [7].

This condition must be differentiated from neutral, structural damages to vital structures which need to be recognized more urgently, as they carry greater risks of severe complications. While rare, recurrent laryngeal nerve injury, laryngeal nerve palsy, esophageal perforation, and hematoma causing mass effect are all causes of post-operative dysphagia that need to be recognized early to avoid catastrophic outcomes. Additionally, there is a need to identify the dysphagia patients at risk of aspiration, since dysphagia is a major risk factor for aspiration pneumonitis [10]. Furthermore, dysphagia must be differentiated from odynophagia, which is quite common in this subset of patients, and is defined as pain with swallowing—a separate entity to the swallowing dysfunction that defines dysphagia.

Patient reported outcome measures (PROM) have been used in the past to assess swallowing difficulties and report their incidence in surgical series, but none have been universally accepted or validated as a standard for assessment. The main issue if that PROM scales are susceptible to reporting, detection, performance, and attrition biases, and might lead to unfounded recommendations. The attributes of an accurate, predictive scale should that it is sensitive, non-invasive, low cost, reliable, unbiased, and it should suggest an etiology of the dysphagia which can be addressed.

The main obstacle with the development of these scales is that the swallowing mechanism is a complex, coordinated sequence of neuromuscular events, allowing transit of a bolus down the esophagus without causing aspiration. Direct flexible endoscopic evaluation of swallowing, as well as Modified Barium Swallow (MBS), the gold standard of evaluation, identify several potential abnormalities, such as vallecular residue with poor clearance, decreased or absent epiglottic invertedness, reduced pharyngeal wall movement, decreased laryngeal excursion, and upper esophageal sphincter dysfunction. It is not clear how the anterior approach to the cervical spine affects these mechanisms, and what combination of patient characteristics, surgical techniques, and instrumentation might be responsible for these abnormalities. Identifying each of these is our ultimate goal; this paper serving as the literature review framework for our further research. Thus, we chose to survey the literature for the subjective and objective measures used to classify dysphagia, and further describe and analyze them in the context of post-operative dysphagia after anterior cervical spine surgery, with somewhat of a focus on the most common anterior cervical spine procedure, the anterior cervical discectomy and fusion (ACDF), due to the predominance of literature specific to this procedure. In a focused review of the current ACDF literature, we evaluated the frequency and severity of postoperative dysphagia and identified the validity and reliability of the current tools utilized to measure it. Our aim is to provide a thorough assessment of the variable scales used to describe this condition and recommend standard measurement practices for surgeons who encounter this postoperatively. From there we will expand our future studies to all anterior cervical spine surgeries and to identifying preoperative and intraoperative patient factors, techniques, and instrumentation characteristics that are associated with worse postoperative dysphagia.

Methods

We searched PubMed starting in February of 2021 using the terms “anterior cervical discectomy and fusion” and “dysphagia or postoperative dysphagia.” Articles recommended by the principal investigators (Paré and Postma) and those found by manually reviewing the references of articles acquired in the above searches were reviewed as well. We selected papers published in English that were meta-analyses, systemic reviews, prospective, or retrospective studies. Exclusion criteria included case reports, cadaveric or experimental studies in animals, and lack of quantitative dysphagia scale. Studies were excluded if scales were inappropriate to dysphagia. Our selection was further consolidated via abstract and title screening. Ultimately, nineteen articles were included in the literature review. The selection process of these articles can be seen in Fig. 1. Two authors then conducted full-text review of these articles.

Fig. 1
figure 1

PRISMA systematic review diagram

Data concerning dysphagia was extracted from the studies that remained after the screening process. Pertinent details included number of patients, any significant preventable/unpreventable risk factors, dysphagia scale(s) used, and incidence rate of dysphagia including experimental and control groups based on each study. Level of evidence was established using published guidelines. No quantitative analysis was conducted, therefore statistical methodology was not necessary. The level of evidence rating was assigned to each article by two authors (Nijim and Cowart) using published guidelines [11]. The list of articles, as well as their level of evidence rating can be seen in Table 1.

Table 1 Studies included within the literature review

Results

Bazaz grading system

Rosenthal et al. [12] analyzed the validity and reliability of Bazaz score when compared to EAT-10 in determining PD after one, two, or three-level ACDF. 32% of EAT-10 positive cases of dysphagia were otherwise scored as ‘None’ by the Bazaz scale. Bazaz was found to miss significant cases of dysphagia that were picked up by EAT-10.

Huang et al. [4] stated that a limitation of their retrospective analysis was that they used the Bazaz Grading system, which is based solely on the subjective viewpoint of the patient; therefore, it was unreliable. They stated a more objective method of measuring PD should be used in the future.

Liu JM et al. [13] created a new dysphagia scoring system, where pharyngeal pain and foreign body sensation were the major factors accounted for in the system. The patients that had undergone ACDF were given this new scoring system, as well as the Bazaz grading scale. The new dysphagia scoring system was more detailed than the Bazaz scale, and the two had a good correlation with a correlation coefficient greater than 0.65 (p < 0.001).

Skeppholm et al. [14] found in a prospective study that the Bazaz score did not correlate with the DSQ, MDADI, or the EQ-5D.

Liang et al. [15] found a low correlation coefficient when comparing the Bazaz score with the MDADI, supporting the conclusion that the system has low validity in measuring the severity of PD (r = −0.63). Furthermore, they found that Bazaz had a low diagnostic accuracy in determining mild or moderate to severe PD (AUC < 0.90). The average time to conduct was 0.5 min.

Eating assessment tool (EAT-10)

Four studies, three prospective and one retrospective, studied the EAT-10 tool and its ability to measure the severity of dysphagia and its effects on quality of life effectively.

Cheney et al. [16] specifically evaluated the ability of the EAT-10 tool to evaluate the risk of aspiration, which is one of the major risks in patients with dysphagia. The mean EAT-10 was 16.08 (± 10.25) for non-aspirators and 23.16 (± 10.88) for aspirators (P < 0.0001). The sensitivity and specificity for predicting aspiration in these patients was found to be 71% and 53%, respectively. Subjective dysphagia symptoms recorded using EAT-10 was found to be a useful tool in predicting aspiration.

Belafsky et al. [17] performed a prospective cohort study of the EAT-10 dysphagia scale. It was found to have excellent internal consistency (α = 0.96), test–retest reproducibility (coefficient range 0.72–0.91), and criterion-based validity. Its function was recognized as documenting initial dysphagia severity, as well as monitoring the response to treatment in persons with swallowing disorders of multiple discrete etiologies. The data collected suggested that scores of 3 or higher are considered abnormal. EAT-10 scores were significantly higher in patients with oropharyngeal dysphagia, esophageal dysphagia, and history of head/neck cancer compared to patients with reflux disease and voice disorders (p < 0.001). EAT-10 scores of these patients were significantly lower after treatment (p < 0.001).

Ohba et al. [18] performed a prospective study evaluating PD after ACDF with EAT-10 scale and the Hyodo-Komagane score (HK), which was collected with flexible endoscopy. This paper found that the HK scoring method, a more objective evaluation of swallowing, was more likely to detect dysphagia. 8.5% of subjects had evidence of dysphagia in the preoperative period on endoscopy compared to 0% when evaluated with EAT-10. HK scores were found to be significantly more sensitive than EAT-10 scores (p < 0.05). However, positive correlation between HK and EAT-10 scores (r = 0.61, p < 0.001) was established in the early postoperative period, making both methods credible to assess PD.

Rosenthal et al. [12] conducted a prospective study evaluating PD after ACDF with Bazaz and EAT-10 scales. EAT-10 had excellent internal reliability (α = 0.95) and significant positive correlation to Bazaz severity score (r = 0.82). 32% of EAT-10 positive cases of dysphagia were otherwise scored as ‘None’ by the Bazaz scale.

Dysphagia short questionnaire

Rosenthal et al. [19] found that cumulative DSQ scores correlated with the MDADI, which is a dysphagia scoring system for patients with head and neck cancers. DSQ scores were also found to have significant test–retest reproducibility (r = 0.61, p < 0.01). Furthermore, the scores were found to change after ACDF, supporting that they reflected clinical outcomes.

Skeppholm et al. [14] did a prospective validation study looking at if the DSQ could measure dysphagia in patients undergoing ACDF. The DSQ correlated with the MDADI, which is an already validated dysphagia scoring system (r = 0.59). The DSQ also had a significant correlation with the EQ-5D (p < 0.05). Additionally, the DSQ showed good reproducibility.

Liang et al. [15] found that the DSQ had weak internal consistency and reliability (α = 0.454). Additionally, they established that the correlation coefficient between the DSQ and MDADI was low, suggesting a low validity for the test (r = − 0.64). It was found that the DSQ was ineffective in diagnosing moderate to severe PD (Area under ROC Curves (AUC) < 0.9). The average time to conduct was 1.2 min.

Swallowing quality of life questionnaire (SWAL-QOL)

Cordier et al. [20] gathered SWAL-QOL survey data from 507 patients at risk for oropharyngeal dysphagia (OD), 75.7% of which had OD confirmed by Modified Barium Swallow. Using Rasch analysis, they found low person reliability for most of the subscales in the survey (0.47–0.73), which means more items are required within each subscale to stratify patients more effectively. Item reliability for all the scales was above 0.9, meaning the hierarchy of the scales is accurate.

Mayo et al. [21] did a retrospective study to determine which parts of the 44-item SWAL-QOL survey are most pertinent to assessing dysphagia following ACDF. They found that only 16 questions showed statistical and clinical significance from preoperative to postoperative values, meaning a shortened version of SWAL-QOL could be used for ACDF patients (p < 0.05). The shortened survey was found to have strong internal consistency and reliability (α > 0.9).

Okano et al. [22] found that the minimum clinically important difference (MCID) of the SWAL-QOL was 9 of 100, which means that improvements of less than 9 points on the SWAL-QOL would yield clinically insignificant results for patients.

Hospital for special surgery dysphagia and dysphonia inventory (HSS-DDI)

Hughes et al. [23] did a multiphase study to develop and evaluate the validity and reliability of the HSS-DDI in assessing dysphagia after ACDF. When administered on 49 patients after ACDF, the test had α = 0.97, demonstrating great internal consistency of the test. In the final phase of the study, they established external validity of the test when correlation coefficients resulted ranging from 0.5 to 0.7 when compared to SWAL-QOL and MDADI surveys. Internal validity of the test was shown by a worsening HSS-DDI score when increasing the number of vertebral levels in ACDF (p = 0.02). The average time to administer this test was 2 min and 25 s.

Liang et al. [15] evaluated 132 patients after ACDF with the HSS-DDI and HSS-Dysphagia subscale and found that the test has a very high reliability (α = 0.969, α = 0.). The correlation coefficients between the HSS-DDI and HSS-Dysphagia subscale with the MDADI were high, suggesting these tests both have high validity (r > 0.7). The receiver operating characteristic (ROC) curves supported these tests having high diagnostic accuracies in determining mild and moderate to severe PD (AUC > 0.9). Lastly, the times to conduct the HSS-DDI and HSS-Dysphagia are 5.8 and 3.5 min, respectively.

Okano et al. [22] found that the minimum clinically important difference (MCID) of HSS-DDI test was 10 points, signifying improvements of less than 10 points on this test would not be perceived by patients.

Modified Barium Swallow (MBS)

Nordin et al. [24] investigated the development of SLPs in being able to properly use objective, standardized MBS protocols to accurately measure the severity and location of impairment causing dysphagia in patients. They found all the SLPs, irrespective of their level of experience, were able to attain 80% accuracy in their measurements when compared to three expert clinicians within 8 weeks. As accuracy of their measurement increased, their time to administer decreased (p < 0.05). Their mean time for completing MBS was 25 min.

Hawkins et al. [25] studied if MBS in addition to a barium esophagram could yield more accurate diagnostic results for dysphagia. They found that 85.1% of normal MBS or esophagram findings were paired with abnormal esophagram or MBS findings, respectively. Therefore, doing both studies in conjunction with each other increases diagnostic accuracy of dysphagia.

Watts et al. [26] found that standardizing MBS with an esophageal sweep protocol improved the test’s ability to diagnose esophageal dysphagia and oropharyngeal dysphagia (p < 0.05).

Flexible endoscopic evaluation of swallowing (FEES)

Hiss et al. [27] reviewed the technique, interpretation, predictive value and safety of FEES. In the authors’ review of four prospective studies, they concluded that FEES was at least equivalent if not superior in sensitivity and specificity concerning penetration and aspiration compared to MBS.

Giraldo-Cadavid et al. [28] performed a systematic review and meta-analysis of six articles that found high sensitivity for FEES concerning bolus aspiration, penetration, and residue. The sensitivity for FEES was significantly higher than for videofluoroscopic swallow studies in the previously mentioned categories.

Erwood et al. [29] measured observer variability when using FEES to assess PD following ACDF by using two expert SLPs to independently evaluate images from patients. There was a reliability coefficient (κ) of 0.77 for the preoperative Penetration-Aspiration Scale. The post-operative Swallowing Performance scale conveyed strong agreement between the experts with a Kendall’s W of 0.82 and an intraclass correlation coefficient (ICC) of 0.53. Ultimately, they found FEES was reliable in assessing PD for ACDF patients.

Discussion

The anterior cervical discectomy and fusion (ACDF) is one of the most common spinal procedures performed in the United States, with about 137,000 performed per year [2]. Due to its improved clinical and radiographic outcomes, the Smith-Robinson technique for ACDFs is considered the gold-standard surgery for single and multi-level cervical disc disease when more conservative treatments fail [1, 3, 8, 10, 30]. While the ACDF is considered generally safe, complications have been reported, the most common of these being postoperative dysphagia (PD) [1,2,3,4,5,6,7,8, 10, 18, 22, 30,31,32]. Short-term mild PD, usually defined as lasting less than three months, is so frequently seen that it can almost be considered an inevitable consequence of the operation [1]. However, this does not insinuate that transient PD is not to be monitored closely as it could result in long-term adverse events if ignored. Long-term PD, however, is more concerning due to the increased likelihood of aspiration pneumonitis and poor nutrition intake, among other unfavorable sequelae. As this procedure is performed more frequently in the outpatient setting, it is important to recognize which patients are at higher risk of developing severe PD due to its associated morbidity.

One of the main challenges in interpreting the available literature is that the definition of dysphagia and objective classification measures are heterogeneous and inconsistent [19, 30]. This could be due to differences in objective measurements and an unclear understanding of the pathophysiology of dysphagia [7]. The stratification of dysphagia using specific scales is paramount in the attempt to precisely evaluate the incidence of and risk factors for this condition [33]. The end-goal of this review was to present findings and recommendations that could be used to develop a standardized risk stratification system or tool for long-term PD in postoperative ACDF patients. Without clarity in our definition of postoperative dysphagia with respect to time and severity, we will be unable to pinpoint the risk factors that could predict which patients will experience aspiration events.

We conducted a review of the current PROMs and objective scales currently utilized in measuring the severity of PD after ACDF. PROMs can be a quick and efficient tool for measuring a patient’s subjective views on how PD is affecting their quality of life after ACDF [12, 15, 23]. These tools are only effective if they are valid and reliable in diagnosing a specific severity of PD, so a specific standard of care can be administered to optimize outcomes [15]. Furthermore, the specific risk factors for PD could be more clearly identified with specific and sensitive tools that can measure dysphagia accurately.

In the literature, the variables that surround ACDF and long-term PD are measured inconsistently and infrequently which has limited the quantitative efforts of inter-study comparison. For example, postoperative soft tissue swelling has been measured as a length [4, 34, 35], a ‘Postoperative Soft Tissue Swelling Index’ [36], a hazard ratio [4, 7, 35], and a percentage change [37]. One of the overarching difficulties has been to elucidate the true prevalence of this symptom based on a validated scale so that the diagnosis can be made certainly. It must then be followed with consistent questions as part of a scale to track the progression of this condition. The nature of these PROMs is such that they are delivered with ease, quickness, and consistency to remove practitioner bias from this specific line of questioning over the course of multiple follow-up visits in the case of PD. Specifically, for the EAT-10, the internal consistency (Cronbach alpha) of the final instrument was 0.960, and the test–retest intra-item correlation coefficients approached 0.91 [17]. The integration of these PROMs into clinical settings and investigations has the potential to remove some of the noise from clinicians’ information processing to categorize and treat patients suffering from this symptom more accurately.

Within current neurosurgery/spine literature, the most common subjective tests using patient reported outcome measures (PROMs) are the Bazaz Grading Scale, Eating Assessment Tool (EAT-10), Dysphagia Short Questionnaire (DSQ), Swallowing Quality of Life Questionnaire (SWAL-QOL), M.D. Anderson Dysphagia Inventory (MDADI), and Hospital for Special Surgery Dysphagia and Dysphonia Inventory (HSS-DDI).

Riley et al. [33] conducted a systematic review that concluded a need for a universal tool to assess the severity of PD. Several studies have used SWAL-QOL to evaluate PD, but it is a long 44-item questionnaire that takes an average of 15 min to complete. This time cost is not negligible and could easily lead to inefficiencies that could impact patient care. However, one study found that a shortened 16-item version of the SWAL-QOL was statistically and clinically significant, meaning this version could be practically adopted in the future [21].

Bazaz et al. created a short and simple grading system to differentiate between the different severities of PD. Although the scale is short and simple, the Bazaz grading system divides patients broadly into the categories “None”, “Mild”, “Moderate”, and “Severe” [12]. Bazaz is currently the most widely used dysphagia classification system within the current ACDF neurosurgical literature; however, the Bazaz scale has never been validated [12]. Multiple studies have indicated the need for more granular quantification of dysphagia to show changes in severity over time [22, 30].

One of the most accurate and practical subjective questionnaires utilized currently is the EAT-10 tool. EAT-10 consists of 10 questions that are each scored from 0 to 4, and this method has shown to be more reliable than the commonly used Bazaz grading system [12]. Based on the studies, the most valid and reliable PROM is the HSS-DDI, which takes only about 2 min to administer [23]. The HSS-DDI was also found to have a very high diagnostic accuracy in stratifying mild, moderate, and severe PD [15].

The EAT-10 assesses the severity of dysphagia using 10 items with a 0–4 scale for each item, and it has been shown to be both valid and reliable [12, 17, 19]. The SWAL-QOL is a 44-item survey that assesses the impact oropharyngeal dysphagia may have on 10 quality of life areas, which are “food selection, burden, mental health, social function, eating duration, eating desire, sleep disturbance, fatigue, difficulty communicating, and fear of choking” [23]. The MDADI is a 20-item list that assesses the severity of dysphagia based on these 4 scales: global impact of dysphagia, impact of dysphagia on emotions, psychosocial function, and eating [23]. It has been validated as accurate in diagnosing PD, and therefore, it is used as a reference to measure the validity of newer dysphagia severity tests [22, 38]. The HSS-DDI is a 31-item list with a 1–5 ranking for each item to assist in measuring dysphagia after ACDF [23]. Otolaryngology literature routinely uses the above PROMs except for the Bazaz scale in assessing dysphagia. The aforementioned PROMs provide more data and quantitative measurements in the assessment of PD with opportunities for more granular data points, especially when longitudinally collected.

When compared to PROMs, objective tests, are better at accurately assessing dysphagia. The Modified Barium Swallow (MBS) is the current gold-standard validated test for assessing oropharyngeal dysphagia [39]. MBS is a radiographic procedure that assesses patients’ oral, pharyngeal, and upper esophageal mechanics as they swallow. The standardized scoring of the MBS Impairment Profile (MBS-Imp) has been effective in improving the consistency of the impressions of the results of MBS studies [25]. Limitations for this study type are radiation exposure, intermittent recording due to use of radiation, and specialty equipment and staff requirements that could make MBS difficult to use in high-throughput situations [40]. Another objective test used is Flexible Endoscopic Evaluation of Swallowing (FEES), which is a procedure in which a Speech Language Pathologist (SLP) or otolaryngologist uses an endoscope to visualize the patient’s larynx, pharynx, and trachea before, during, and after he or she swallows [27]. The swallowing efficiency is assessed using different textures and sizes of food. FEES has also been established as a trustworthy and sensitive evaluation of swallowing disorders, with sensitivity values that were significantly greater than that of MBS [28]. Limitations of this examination are operator experience, swallow “white-out”, and resource expense for scope acquisition and maintenance [40].

These tests allow for dynamic evaluation of the phases of swallowing, which helps in determining specific structural causes of dysphagia. However, these observations do not always correlate well with the subjective symptoms each patient may have [41]. They also have a higher cost, take longer to administer, and are more invasive when compared to PROMs. The patients with only subjective dysphagia symptoms still have clinically significant manifestations of PD; therefore, both objective and subjective measures must be taken into consideration. This represents an imbalance in usefulness in the clinical setting versus the investigative setting. These procedures are more likely to be beneficial to a researcher who requires more scrutiny of individual patients/participants as opposed to a clinician who is attempting to rule out PD in a postoperative patient. For most patients, subjective scales that correlate best with the objective measures, such as the HSS-DDI and EAT-10 scale, are enough to diagnose PD and keep track of progress. For patients that are not improving or have known dysphagia prior to any operation, the use of objective measures should be implemented to get a more accurate representation of the cause of their PD so more specific treatment options can be selected.

There were limitations within this review and its referenced studies. Firstly, the number of studies analyzed for some of the dysphagia test categories was not large enough to make any supported claims. Secondly, the level of evidence for many of the studies was unable to be determined, so the reliability of those specific studies is uncertain. More precise definitions of dysphagia would allow for more consistent characterization of PD and would help to better associate risk factors with the development of this symptom. The prospective literature does not frequently compare subjective or objective scoring systems. For this and other reasons, this study was unable to pool data in a reasonable structure as to provide quantitative analysis of the differences in the included articles.