Introduction

Degenerative cervical diseases encompass several cervical degenerative conditions, such as disc herniation, spondylosis, and Ossification of the Posterior Longitudinal Ligament [1]. Anterior cervical spine surgery is an imperative procedure for these pathologies, and dysphagia is among the most common complications. The incidence of dysphagia symptoms after anterior cervical spine surgery has varied dramatically from 1 to 87.5% [2,3,4]. Although the exact etiology is still unknown, the tremendous range of incidence reported by literature may be due to exposure to different risk factors, including surgery length, increased cervical lordosis, multi-segmental procedure, and steroid application. [4,5,6,7,8]. Another reason for the discrepancy of incidence is the lack of consensus on the best subjective self-report tool of dysphagia symptoms after anterior cervical spine surgery [9]. The inconsistent usage of different measurement tools has obstructed the comparison of different studies’ results, thus limiting research progress in this area.

Instrumental assessments such as the Video-Fluoroscopic Swallow Study (VFSS) or Fiberoptic Endoscopic Evaluation of Swallowing (FEES) are of great value for patients with dysphagia symptoms. These gold standard assessments allow for the detection of oral, pharyngeal, and esophageal dysfunction and, accordingly, further reveal the pathophysiology of swallowing difficulty [10, 11]. However, these objective measurements need to be performed by specialists with special equipment and are not suitable for screening after the surgery. Furthermore, dysphagia symptom is the patient’s self-perception of swallowing difficulty, which does not always correspond to the objective evaluations [12,13,14].

The patient-reported outcome measures (PROMs) are subjective evaluation tools based on the personal view of their health status [15]. Several PROM instruments have been developed to measure dysphagia symptoms, such as the 10-item Eating Assessment Tool (EAT-10) and the Swallowing Quality of Life Scale (SWAL-QOL) [16, 17]. These subjective tools cover the shortages of objective evaluations and are therefore widely used [18]. Nevertheless, most of these instruments were developed for the general population of patients with head and neck cancer, esophageal diseases, or chronic neurological disorders. Postoperative dysphagia’s mechanism differs from dysphagia as a sequela of these diseases; therefore, these evaluation tools may not totally fit for patients who have undergone cervical spine surgery. To date, three PROM tools have been designed specifically for patients after cervical spine surgery: the Bazaz scale, the Dysphagia Short Questionnaire (DSQ), and the Hospital for Special Surgery-Dysphagia and Dysphonia Inventory (HSS-DDI) [19,20,21]. There is no study comparing these instruments, and the best choice for clinical practice and research is still unknown.

Recently, a study concerning postoperative dysphagia in the Asian population using the Bazaz scale showed that fusion surgery and increased lordosis were risk factors [8]. The authors further concluded that the dysphagia’s severity and exact causes could not be analyzed as the slight change in severity could not be detected by this scale [8]. Although the number of cervical surgeries in Asia is equally high and using native-language tools for assessments is essential, to our knowledge, there is no validated instrument in the Chinese language that specifically for patients after cervical surgery. The Bazaz scale was not fully validated, while the DSQ and HSS-DDI were available only in English versions and had not been studied in the Chinese population. The Chinese versions of these PROM tools needed to be developed and their psychometric properties remained to be verified.

One of our goals in this study is to develop the Chinese versions of DSQ and HSS-DDI and verify their psychometric properties. Another aim of this study is to evaluate and compare the reliability and validity of the Bazaz scale, the DSQ, the HSS-DDI, and its subscale with the reference of MDADI, and find the most suitable tool for the evaluation of dysphagia symptoms after anterior cervical spine surgery.

The M.D. Anderson Dysphagia Inventory (MDADI) is a self-administered questionnaire developed to measure the swallowing-related quality of life [22]. Many researchers have also used this tool to assess dysphagia in patients after anterior cervical surgery [20, 21, 23, 24]. Although this measure was not originally developed for patients after cervical surgery, we used it as a reference criterion because it was the only measurement that had been perfectly adapted and proved to be psychometrically valid and reliable in the Chinese population [25, 26].

Materials and Methods

Participants

Our study design and reporting followed the Standard for Reporting of Diagnostic Accuracy (STARD-2015) [27]. Our institute’s Research Ethics Committee approved this prospective study (No. GDREC 2017293H). One hundred and fifty consecutive patients diagnosed with degenerative cervical diseases who had an anterior cervical surgery (including anterior cervical discectomy and fusion (ACDF), anterior cervical corpectomy and fusion (ACCF), and anterior cervical disc arthroplasty (CDA)) by one senior surgeon from March 2019 to February 2020, were recruited for this study. Patients were excluded if they had preoperative dysphagia, if they had had revision procedures or procedures treating conditions other than degenerative cervical diseases, or if they had dysphagia due to other diseases. All of the patients who agreed to participate signed informed consent forms. They were then assessed with the surveys face-to-face at their one month (30 ± 5 days from surgery) follow-up visit after surgery. The Chinese versions of the four surveys (Bazaz scale, DSQ, HSS-DDI, and MDADI) were printed on four different sheets, and each was presented in random order to each participant. The participants were allowed to fill out all of the sheets with or without assistance at the same time. Clinical and surgical data, including sex, age, diagnosis, smoking history, and the number of involved segments, were also recorded.

Surveys for Postoperative Dysphagia

Bazaz Scale

The Bazaz dysphagia scale was the first dysphagia-symptom detecting tool for patients with cervical diseases [19]. In this survey, patients are graded as having none/mild/moderate/severe dysphagia by evaluating their difficulties with consuming liquids and solid foods. A numerical scoring system ranging from 0 (none) to 3 (severe) was introduced to facilitate further analysis, as detailed in other studies [28, 29]. A higher Bazaz score indicates a more severe dysphagic profile.

Dysphagia Short Questionnaire (DSQ)

The DSQ was designed for evaluating the perceptions of swallowing difficulty after anterior cervical spine surgery by Skeppholm et al. in 2012 [20]. It is calculated by summing up the points obtained from 5 items, resulting in a DSQ score from 0 to a maximum of 18 points. A higher DSQ score represents a more prominent dysphagia symptom.

Hospital for Special Surgery-Dysphagia and Dysphonia Inventory (HSS-DDI)

HSS-DDI was a “patient-derived, validated, and condition-specific” PROM tool developed by researchers at the Hospital for Special Surgery in 2018 [21]. This 31-item survey contains two domains: the dysphagia domain (20 items) and the dysphonia domain (11 items), which reflect the severity of dysphagia and dysphonia symptoms. The overall HSS-DDI score is calculated as the sum of all raw scores divided by the maximum possible score (124) and multiplied by 100 [21]. In this study, we also calculated the dysphagia items score (namely, HSS-Dysphagia subscale) by converting the raw score of the dysphagia domain into a 100-point system in a similar way. The score of the HSS-Dysphagia subscale is the sum of the raw scores from the dysphagia items divided by 80 and then multiplied by 100. A higher score indicates a less severe symptom.

M.D. Anderson Dysphagia Inventory (MDADI)

The MDADI is a psychometrically validated, self-administered questionnaire widely used in patients with neoplastic or neurological diseases [22, 30]. The Chinese version of MDADI is the only measurement validated in the Chinese population [25, 26]. Here, we employed this well-validated and broadly accepted survey as a reference criterion. A higher MDADI Composite score indicates better daily functioning and a more favorable quality of life as related to swallowing ability [22].

Translation of DSQ and HSS-DDI

Given that all of the participants in this study were native Chinese Mandarin speakers, the Chinese versions of these surveys were needed. The Bazaz scale and MDADI have been translated and used successfully in Chinese patient populations [25, 26, 31]. There was no published, validated Chinese version of the DSQ and HSS-DDI. With permission from the original authors of these measurement tools, we translated the DSQ and HSS-DDI under the guidance of the cross-cultural adaptation process [32]. Briefly, the original English versions of these two questionnaires were translated independently into simplified Chinese by a bilingual spinal surgeon and a professional translator. Next, a complete version of the survey was built based on the consensus reached after comparing the two translations. Two bilingual spine surgeons unaware of the original English version performed the back-translation process. A cultural adaptation committee composed of all translators, two additional spine experts, and one otolaryngologist further assessed these translations and considered the most appropriate wording of each question for the target population. The draft versions were determined by consensus. Next, these draft versions were administrated to 20 participants with face-to-face interviews, and their comments on difficulties with understanding the surveys were recorded. Based on these critiques, the final, simplified Chinese versions of the DSQ and HSS-DDI were constructed.

Psychometric Evaluation

The distribution of each survey was assessed. Each survey’s internal consistency was evaluated by measuring the Cronbach’s α coefficient and split-half reliability. The internal validities were examined by comparing single-segmental and multi-segmental surgeries, as the multi-segmental procedures have been a significant risk factor for postoperative dysphagia [4, 33, 34]. Confounding factors, including age, sex, and smoking history, were adjusted with ANCOVA analysis. The criterion validities were assessed by calculating the Spearman or Pearson correlation coefficient.

We chose a questionnaire rather than an instrumental assessment as the reference criterion. Several studies on dysphagia due to other diseases found that objective evaluations did not correlate well with reported symptoms [11,12,13,14]. Although objective evaluation tools are generally considered more accurate and stable than PROM tools, they may not fully reflect the patients’ personal views on the dysphagia symptoms. Given analyzing the questionnaire psychometric properties includes comparing the abilities to reveal individual perceptions of symptoms through the items, a subjective evaluation tool was more suitable to be the reference in our study. We used the MDADI as the reference criterion because only this measurement was proved to be psychometrically valid and reliable in the Chinese population. According to previous studies, the patients reported “adequate” outcomes if their MDADI Composite scores were between 60 and 80, and “poor” outcomes if less than 60 [35, 36]. Therefore, we used 60 < MDADI < 80 as the diagnostic criterion for mild dysphagia, and MDADI < 60 for moderate/severe dysphagia. The optimal cut-off values for the Bazaz scale, DSQ, and HSS-DDI were calculated using receiver operating characteristic (ROC) curves based on the Youden Index as calculated from the sensitivity and specificity, respectively. These measurement tools’ diagnostic efficacies were compared by measuring the area under the ROC curves (AUC).

The minimum sample size for this study was 122, according to Bujang et al. (with an estimated prevalence of dysphagia symptoms as 60% from our preliminary research, power = 0.81) [37]. Hence, we recruited 150 consecutive participants, which allowed for about 20% attrition. SPSS 25.0 package software (SPSS Inc, Chicago, IL) was used for the statistical analyses. A P < 0.05 was considered to be statistically significant.

Results

A total of 132 participants (82 males and 50 females) fully completed all surveys (88% response rate, as the flow diagram shown in Fig. 1). The ages ranged from 21 to 80 years (mean = 56.2 years), with 63 patients (47.7%) older than 55. There were 65 patients (49.2%) who had a smoking history. Most patients underwent discectomy and fusion (ACDF, 89.4%) due to myelopathy (72.7%). There were 87 patients (65.9%) who had single-segmental surgery, and 45 patients (34.1%) had a two- or three-segmental surgery (Table 1). According to the MDADI scores, 43 patients (32.6%) and 14 patients (10.6%) were considered to have mild and moderate/severe dysphagia symptoms, respectively.

Fig. 1
figure 1

The graph shows the study’s flow diagram according to the STARD-2015 statement

Table 1 Demographic of the participants that completed four surveys

The distribution of the four surveys’ scores is listed in Table 2. The mean scores of the Bazaz scale, DSQ, HSS-DDI, HSS-Dysphagia subscale, and MDADI Composite were 0.9 points, 2.4 points, 84.8 points, 83.5 points, and 77 points, respectively. Using a distribution-based method to determine the minimal clinically important difference (MCID) at 0.5 of the standard deviation, the MCID of the Bazaz scale, DSQ, HSS-DDI, and HSS-Dysphagia were 0.5 points, 1.4 points, 9.7 points, and 10 points, respectively. About 56% and 40% of patients had minimum values of the Bazaz scale and DSQ (0 points = no dysphagia), respectively. There were fewer floor or ceiling effects in the HSS-DDI (29.5%), HSS-Dysphagia subscale (31.8%). Therefore, the HSS-DDI and HSS-Dysphagia subscale had better distribution than the other two surveys.

Table 2 Distribution of the scores from all surveys

The internal consistencies reflected by Cronbach’s α value and Spearman − Brown split-half reliability coefficients are listed in Table 3. The HSS-DDI and HSS-Dysphagia subscale demonstrated the best internal consistencies (α and r > 0.9). However, the Cronbach’s α for the DSQ was 0.454, and its Spearman–Brown coefficient was 0.258, revealing a weak internal consistency for this scale. Thus, the HSS-DDI and the HSS-Dysphagia subscale were more reliable than the DSQ scale.

Table 3 Internal consistencies of the surveys

The scores of the Bazaz scale, DSQ, and HSS-DDI significantly worsened in patients with multi-segmental surgery compared with those who had undergone single-segmental surgery (Table 4). These results implied that all three surveys had internal validities good enough to distinguish the severities of dysphagia symptoms between the different segments of surgeries.

Table 4 The scores of all surveys between single- and multi-level surgery

We further examined the criterion validities by performing correlation analyses. As shown in Table 5, all of the correlations between the surveys were statistically significant. There were excellent correlations between HSS-DDI (Overall/Dysphagia subscale) and other surveys (all r > 0.7). The correlation coefficients between the Bazaz scale/DSQ and MDADI Composite were − 0.63/− 0.64, which indicated that the criterion validities of the Bazaz scale and DSQ were not as good as those of the HSS-DDI. The ROC curves showed that the HSS-DDI and HSS-Dysphagia had excellent diagnostic accuracies (all AUC > 0.9) in both mild dysphagia and moderate/severe dysphagia (Fig. 2). The cut-off values of the HSS-DDI scale for mild dysphagia and for moderate/severe dysphagia were 90 (sensitivity = 0.93 and specificity = 0.84) and 70 (sensitivity = 1.00 and specificity = 0.85), respectively, as shown in Table 6. Meanwhile, the AUC for the DSQ was 0.926 for mild dysphagia but was 0.895 for moderate/severe dysphagia, implying that the DSQ was not effective in diagnosing moderate or severe dysphagia. The Bazaz scale performed less accurately than others with AUCs = 0.818/0.800. These results confirmed that HSS-DDI and HSS-Dysphagia subscale had better criterion validities than other questionnaires.

Table 5 Correlation coefficients (r values) between the surveys
Fig. 2
figure 2

Graphic display shows Receiver Operating Characteristic (ROC) curves and the Area Under the ROC Curves (AUC) values of the four surveys (Bazaz scale/DSQ/HSS-DDI/HSS-Dysphagia) for diagnosing mild dysphagia (Left) and moderate/severe dysphagia (Right)

Table 6 Cut-off values of the surveys for mild or moderate/sever dysphagia

The average time spent on the Bazaz, DSQ, HSS-DDI, HSS-Dysphagia, and MDADI were 0.5, 1.2, 5.8, 3.5, and 3.7 min, respectively (Table 2). Although the HSS-DDI had better reliability and validity, it took more time than other surveys. The HSS-Dysphagia subscale focused on dysphagia and achieved a good balance of effectiveness and efficiency. The cut-off values of the HSS-Dysphagia subscale for mild dysphagia and for moderate/severe dysphagia were 90 (sensitivity = 0.89 and specificity = 0.87) and 70 (sensitivity = 1.00 and specificity = 0.83), respectively. The HSS-Dysphagia subscale had a similar criterion validity to the HSS-DDI, but it can be completed in less time. This scale also demonstrated good internal consistency and internal validity. Therefore, the HSS-Dysphagia subscale is recommended for evaluating dysphagia symptoms after anterior cervical surgery.

Discussions

The Importance of Comparing the Surveys

The mechanism of dysphagia after anterior cervical surgery is complicated and is considered to be due to pharyngeal or upper esophageal dysfunction involving anatomic, neurologic, and muscular disorders [38, 39]. Patient-reported outcome measures are handy tools that are easy to apply for screening or follow-up evaluation and reflect patients’ subjective perceptions. A specific and sensitive scale may help describe postoperative symptoms of swallowing difficulty and enhance the interpretation of discomfort with quantitative conclusions that lead to more precise and standardized outcomes. Nonetheless, an erroneous scale may affect the doctors’ judgments and further impact the interventions that would need to be provided. Therefore, it is crucial to study the validity and reliability of the existing scales and determine the most reliable assessment means. In this study, we compared the surveys developed specifically to measure dysphagia symptoms after anterior cervical surgery in the Chinese population and found the HSS-Dysphagia subscale outperformed other relevant surveys.

HSS-DDI

The original version of the HSS-DDI was validated to measure dysphagia and dysphonia in this narrowly defined patient population. We performed the translation and cultural adaptation of this survey while maintaining its psychometric properties. Though both dysphagia and dysphonia frequently occur after surgery, they are different complications due to different mechanisms and therefore need to be analyzed separately [40, 41]. We extracted the dysphagia domain by converting the raw score of dysphagia items into a 100-point system and named it the “HSS-Dysphagia subscale.” Our results showed that this subscale had excellent reliability and validity, similar to the HSS-DDI, but with fewer items. Furthermore, we calculated the HSS-Dysphagia subscale threshold as 90 for mild dysphagia and 70 for moderate/severe dysphagia, with an MCID of 10 points. These results are almost the same as the recent report on the MCID of HSS-DDI [42]. Thus, we recommend this scale for clinical application and research.

Bazaz Scale

Over 30 previous studies have measured dysphagia’ incidence using the Bazaz scale [2]. Despite broadly use for more than ten years, this scale had never been validated [9]. In the research of Skeppholm, the Bazaz scale did not correlate with the DSQ, the MDADI, or the EQ-5 D [20]. Likewise, we found that the Bazaz scale had the most inferior accuracy. This scale was based on the concept that postoperative dysphagia begins with solid food as if caused by anatomic pathogenesis. However, dysphagia with motility as its cause is more likely to occur with both solids and liquids [38]. Postoperative dysphagia’s etiology can be both anatomic and functional, and the spectrum of dysphagia symptoms does not simply range from solids to liquids [38, 39]. Besides, this four-point grading scale is not sensitive enough to detect a slight change of the symptom [8]. It is considered less accurate to use the Bazaz scale by merely inquiring about swallowing difficulties with solids and liquids to assess the patient’s dysphagia symptom.

DSQ

The Cronbach’s α coefficient for the DSQ was calculated as 0.454 in our study, which differed from Skeppholm’s study (0.82) [20]. All of the participants in our study were assessed one month after surgery, and their dysphagia was prone to be transient or in an early stage that had resulted in neither weight loss nor pneumonia at that time. Therefore, most of them scored the fourth and fifth items (which regard the symptoms of losing weight and pneumonia) with 0 points, regardless of whether they had postoperative dysphagia. The low response rate of these items in the early postoperative period may reveal the weak internal consistency of DSQ. Additionally, many patients with severe but transient dysphagia did not have aspiration or pneumonia, which may explain the inferior efficacy of DSQ in these patients. We found DSQ was excellent in detecting mild dysphagia symptoms with high sensitivity and specificity, implying that this short questionnaire was suitable for routine postoperative care and screening.

Limitations

This study had several limitations. Firstly, we did not perform an instrumental assessment as the standard reference, and the actual incidence of swallowing dysfunction in our cohort was unknown. Several studies have concerned fluoroscopic or endoscopic abnormalities in patients with dysphagia following anterior cervical surgery [39, 43,44,45,46]. Despite the significant abnormalities found by instrumental assessments in these studies, objective findings did not always correlate to the swallowing difficulty symptoms. One prospective study showed none of the patients who emerged with radiographic evidence of swallowing abnormality after surgery described clinical complaints of dysphagia [46]. In Ian et al.’s study, they showed a weak association between fluoroscopic abnormalities and EAT-10 scores in patients with chronic dysphagia but no association in those with acute dysphagia [43]. The objective evaluation cannot fully reflect the patients’ personal views on the dysphagia symptoms, which may partially explain the weak association between objective and subjective examinations. However, the scales that we compared in this study were primarily designed to evaluate the severity of dysphagia symptoms rather than diagnose swallowing function disorder. Although the MDADI was not explicitly developed for patients after cervical surgery, it was psychometrically valid and reliable in the Chinese population. We herein chose the MDADI rather than objective assessments as the reference criterion so that the psychometric properties of these scales could be compared within a similar dimension.

Secondly, the carry-over effect in this within-subjects designed study is difficult to be eliminated, which may have undesirable effects on our results. We performed counterbalancing by assigning the questionnaires in a randomized order. Replication of the study in another cohort may help verify the study results. Thirdly, all patients in this study were assessed at one month postoperatively. Although most dysphagia occurs within one week after cervical surgery, the patients do not fully return to their routine life. Their quality-of-life being affected by dysphagia cannot be adequately evaluated at this time point. Our follow-up duration was not sufficiently long to assess the persisting dysphagia symptoms. However, the incidence of persistent dysphagia is relatively low and is not suitable for comparing the measuring tools. Moreover, the response rates did not reach 100% for all questionnaires, although the surveys were administered face-to-face. Two invalid answers from DSQ and MDADI were caused due to more than one option was selected simultaneously within the same item without being discovered in time. Missing of the items occurred as the patients stopped and refused to continue with the long list of questions, which was mainly found in HSS-DDI (16 out of 150) and MDADI (10 out of 150), revealing the prolixity of these two surveys. It is unknown whether participants’ responses through telephone or other approaches would have been the same. Lastly, we only examined the Chinese versions of these surveys in the corresponding population. Replication studies in other languages may confirm our results.

Conclusions

In this study, we evaluated and compared the validity of three surveys that specifically measured dysphagia symptoms after anterior cervical surgery in Chinese patients. The HSS-Dysphagia subscale was found to surpass other scales and was recommended for clinical application and research. On the other hand, the Bazaz scale was considered less accurate than other questionnaires.