Introduction

Dysphagia is a common health concern that affects people around the world. It was reported that dysphagia affects 300,000–600,000 persons per year in the US [1], and was a burden of health especially for the elderly [1,2,3]. Patients with dysphagia are at increased risks for developing medical problems such as aspiration, pneumonia, and malnutrition [1, 2]. Post-stroke dysphagia (PSD), one of the representative complications of 37–78% stroke patients [4, 5], is the most common type of dysphagia [1]. Although some PSDs recover spontaneously, 11–50% of stroke patients keep experiencing dysphagia 6 months after the previous stroke events [5, 6], which brings about poor outcomes and a life-long increase in mortality and morbidity [1, 3,4,5,6].

Due to the many adverse effects of PSD, to precisely evaluate the severity of PSD is crucial for early interventions. Nowadays, the severity of PSD is usually assessed using miscellaneous scales by means of bedside screening tests and instrumental tools [6, 7]. The bedside screening tests provide a glimpse of the patients’ general condition of swallowing by observation and consultation. In instrumental tools, “videofluoroscopy (VF) swallowing study” and “fiberoptic endoscopic evaluation of swallowing (FEES)/videoendoscopy (VE)” are the two mostly used methods for assessment of dysphagia [1, 6, 7]. VF is a dynamic exploration that evaluates the safety and efficacy of deglutition [7]. Patients receiving VF swallowing study take radiopaque materials orally in a sitting position; then, the physicians thoroughly examine the entire swallowing process from the oral region, esophagus, down to the stomach [7], which serves as the traditional gold standard for dysphagia evaluation. VE, on the other hand, is performed by physicians using a laryngoscope to pass transnasally to the hypopharynx. By observing the movement of the laryngopharyngeal structures when taking different textures of food, the physicians examine if there are any dynamic abnormalities during this period [6, 7]. With the above-mentioned subjective and objective modalities, it is available to evaluate patients with PSD more efficiently [8, 9].

Recently, there are emerging miscellaneous scales for evaluating PSD. By quantifying the outcomes into multiple levels based on the severity of dysphagia, the clinical staffs gain more insights into the patient’s situation for further intervention. Although VF remains the priority for PSD assessment due to its accuracy, it is not utilized in all hospitals due to various clinical situations and regional differences. Therefore, this study aims to investigate the consistencies among miscellaneous scales evaluated by different means, which could aid in the future assessment of patients with PSD if VF is not available.

Methods

This study followed the Declaration of Helsinki, and was audited and approved by the Institutional Review Board (IRB) of Kaohsiung Municipal Siaogang Hospital (IRB number: KMUHIRB-F(II)-20190133). All data acquisition was agreed upon by the patients and their relatives for research purposes.

Study design

A total of 49 patients receiving swallowing examinations from 2019 to 2020 in Kaohsiung Municipal Siaogang Hospital were enrolled in this study. The inclusion criteria were patients with previous stroke events followed by symptoms of dysphagia. The exclusion criteria were those with dysphagia prior to stroke events, and those who did not experience post-stroke dysphagia. Of all the PSD patients, five swallowing examinations were conducted after the active stroke events subsided. The tests are introduced as follows:

  1. 1.

    Functional Oral Intake Scale (FOIS), which is performed by physicians with either VF or VE, reflects the patients’ functional oral intake condition. There are seven levels in an ordinary FOIS with different meanings: no oral intake (level 1), tube dependent with minimal/inconsistent oral intake (level 2), tube supplements with consistent oral intake (level 3), total oral intake of a single consistency (level 4), total oral intake of multiple consistencies requiring special preparation (level 5), total oral intake with no special preparation, but must avoid specific foods or liquid items (level 6), and total oral intake with no restrictions (level 7). Those ranked from “level 1” to “level 3’’ are categorized into “tube-dependent group”, otherwise “tube-independent group” if ranked from “level 4” to “level 7”.

  2. 2.

    Dysphagia Severity Scale (DSS), which may be performed by physicians with either VF or VE and by nurse staff’s judgement, serves as a useful instrument to determine the severity of dysphagia. There are seven levels in a DSS with different meanings: saliva aspiration (level 1), food aspiration (level 2), water aspiration (level 3), occasional aspiration (level 4), oral problem (level 5), minimum problem (level 6), and within normal limits (level 7). Those ranked from “level 1” to “level 4’’ are categorized into “choking/aspiration group”, otherwise “without choking/aspiration group” if ranked from “level 5” to “level 7”.

  3. 3.

    Ohkuma Questionnaire, being a convenient and validated measurement to assess the overall swallowing condition over the past 3 months, provides fifteen comprehensible questions for evaluation [10]. Each answer of the question is rated “1 for severe symptoms”, “2 for mild symptoms”, and “3 for absence of symptoms”; dysphagia is highly suspected if more than one answer of the questions is classified as severe symptoms, otherwise is categorized into the non-dysphagia group.

  4. 4.

    Eating Assessment Tool-10 (EAT-10), a well-recognized questionnaire for dysphagia evaluation, offers ten common clinical situations in relation to patients’ swallowing difficulties over the past 3 months. The answer of each question is rated based on the severity of the symptoms from “0” (no problem) to “4” (severe problem). If the sum of points of all questions is more than three, the patient is highly suspected to be abnormal in swallowing.

  5. 5.

    Repetitive Saliva Swallowing Test (RSST), which is a convenient measurement for dysphagia evaluation that requires the aid of an operator, provides insights into the patients’ functional outcome of swallowing. The operator has the patient sit upright at a cozy atmosphere, and may moisture the mouth with little amount of water. Meanwhile, the operator gently puts his index finger at the level of the hyoid bone with other fingers at the neck region in order for measurement; every move of the Adam’s apple through the middle finger is counted once of swallowing. The patient then is asked to try swallowing in the next 30 s for assessment, and the results turn out to be abnormal if less than three times of swallowing are detected during this period.

Among the above methods, FOIS was performed by the physicians only, and DSS was conducted by both the physicians and nurse staffs; the physicians used either videofluoroscopy (VF) or videoendoscopy (VE) for evaluation, while the nurse staff assessed the patients by observation and subjective judgement. The rest of the assessment tools mentioned were adopted simply based on the patients’ personal experience and subjective feelings.

As to the data collection and interpretation, we reviewed the medical records and reports of the swallowing examinations of the patients, and further investigated the consistencies among the miscellaneous scales and methods.

Data processing and statistical analysis

To enact a standardized protocol for data analysis and comparisons, we reorganized the results of the scales based on the severity of dysphagia classified by the original design of each modality. The data of each scale were categorized into “mild group” and “severe group” based on the clinical assessment of dysphagia symptoms, which made them more easily to be compared after grouping. The details are listed below:

  1. 1.

    FOIS: those ranked from “level 1” to “level 3” are classified as the “tube-dependent group (severe group)”, while those ranked from “level 4” to “level 7” are classified as the “tube-independent group (mild group)”.

  2. 2.

    DSS: those ranked from “level 1” to “level 4” are classified as the “choking/aspiration group (severe group)”, while those ranked from “level 5” to “level 7” are classified as the “without choking/aspiration group (mild group)”.

  3. 3.

    Ohkuma Questionnaire: those presented with at least one “severe symptoms” in any of the 15 questions are classified as the “severe group”, otherwise the “mild group”.

  4. 4.

    EAT-10: those with total points more than three are classified as the “severe group”, otherwise the “mild group”.

  5. 5.

    RSST: those who swallow less than three times in 30 s are classified as the “severe group”, otherwise the “mild group”.

Among the above measurements, FOIS was evaluated by VF and VE; while, DSS was evaluated by VF, VE and the nurse staff.

To verify the consistencies among the gold standard VF and the miscellaneous scales and methods, we first used kappa coefficient to analyze “tube-dependent/tube-independent” groups of FOIS performed by VF (VF-FOIS) and VE (VE-FOIS), as well as “mild/severe” groups assessed by Ohkuma Questionnaire, EAT-10, and RSST. In addition, we further used kappa coefficient to analyze “choking or aspiration/without choking or aspiration” groups of DSS performed by VF (VF-DSS), VE (VE-DSS), and the nurse staff (Nurse-DSS), as well as “mild/severe” groups assessed by Ohkuma Questionnaire, EAT-10, and RSST. Lastly, we used weighted kappa coefficients to analyze the agreement between FOIS and DSS using VF (VF-FOIS vs VF-DSS) and VE (VE-FOIS vs VE-DSS), respectively. Of the data presented, p < 0.05 was considered of significance based on a 95% confidence interval (CI). Kappa values were interpreted based on the Landis and Koch’s classification: < 0 = poor agreement; 0.01–0.2 = slight agreement; 0.21–0.4 = fair agreement; 0.41–0.6 = moderate agreement; 0.61–0.8 = substantial agreement; 0.81–1.0 = excellent agreement [11]. All of the statistical analyses were performed with IBM SPSS Statistics Version 22.

Results

The basic characteristics of the PSD patients are presented as Table 1. In analysis of the “choking/aspiration” group and “without choking/aspiration” group based on VF-DSS, both groups shared similarities in gender, age, hypertension, diabetes mellitus, hyperlipidemia, atrial fibrillation, and stroke patterns. The ratings of mRS (p = 0.014), other than the rest of the parameters, were significantly different between the two groups. In analysis of the “tube-dependent” group and “tube-independent” group based on VF-FOIS, no significant differences were shown in all of the parameters.

Table 1 Basic characteristics of the patients receiving swallowing examinations

The inter-rater reliabilities among VF-FOIS and divergent measurements were further analyzed, and the results are presented as Table 2. The results turned out that VE-FOIS (κ = 0.625, 95% CI 0.300–0.950; p < 0.001), other than the rest of the scales and modalities, was the only measurement with a statistically substantial agreement with VF-FOIS. Other means including Ohkuma Questionnaire (κ = 0.016, 95% CI − 0.090 to 0.122; p = 0.778), EAT-10 (κ = 0.034, 95% CI − 0.084 to 0.152; p = 0.590), and RSST (κ = 0.187, 95% CI − 0.131 to 0.504; p = 0.142) possessed only slight agreement with VF-FOIS but without statistical significance.

Table 2 Inter-rater reliability among the scales and VF-FOIS

As to the inter-rater reliabilities among VF-DSS and miscellaneous scales and methods (Table 3), we figured out that VE-DSS (κ = 0.381, 95% CI 0.127–0.636), EAT-10 (κ = 0.269, 95% CI − 0.006 to 0.543), and Ohkuma Questionnaire (κ = 0.213, 95% CI − 0.057 to 0.484) were in fair agreement with VF-DSS. In addition, RSST (κ = 0.198, 95% CI − 0.010 to 0.406) and Nurse-DSS (κ = 0.105, 95% CI − 0.183 to 0.393) had slight agreement with VF-DSS. Nevertheless, VE-DSS was the only scale possessing a significant kappa value (p = 0.007) among all the analyzed scales.

Table 3 Inter-rater reliability among the scales and VF-DSS

Eventually, the consistencies between FOIS and DSS by means of VE (VE-FOIS versus VE-DSS) and VF (VF-FOIS versus VF-DSS), respectively, were analyzed for comparisons (Table 4). The results turned out that the weighted kappa of FOIS to DSS in VE (weighted κ = 0.577, 95% CI 0.414–0.740) was not lower than that in VF (weighted kappa = 0.249, 95% CI 0.136–0.362), and both of the values were of statistical significance.

Table 4 Comparison of VF and VE in analysis of the inter-rater reliability between FOIS and DSS

Discussion

The study aimed to investigate the consistency among various scales used to evaluate PSD. The findings demonstrated that only VE exhibited a statistically significant agreement with VF for both FOIS and DSS. Other clinical bedside assessments were not significantly consistent with VF. As a result, VE may be considered the best alternative to VF.

Post-stroke dysphagia is a common and costly complication of acute stroke, which increases the risk of mortality, morbidity, and institutionalization partly due to the risks of aspiration, pneumonia, and malnutrition [6]. Conducting early assessments of dysphagia can help clinicians identify potential risks of complications in patients. Many literature researches have focused on specific bedside screening tools for dysphagia [12]. However, no prior research has utilized as many dysphagia screening methods as our prospective study, which aims to determine the consistency between these methods and VF in stroke patients. All the screening methods used in our study are commonly used in clinical practice in Asia and have been shown to possess reliability and validity in dysphagia detection [13,14,15,16,17]. Although bedside rating scales are convenient and cost-effective, a systematic review has concluded that none of the bedside screening protocols can adequately predict the presence of aspiration, except for maneuvers based on VF or VE [12]. Most bedside swallow examinations were deemed insufficiently sensitive to serve as a screening test for dysphagia [12]. Another review from the Cochrane Library also revealed that no bedside swallow screening tool with both high and precisely estimated sensitivity and specificity could be identified [18]. The result is consistent with our findings that none of the bedside assessment methods, including Ohkuma questionnaire, EAT-10, and RSST has significant consistency with VF. The result may be because VF and VE can precisely detect whether contrast enhancements have entered the airway or not during the examination process. This implies that silent aspiration can be effectively observed and identified, making it easier to identify patients who are at risk of developing complications.

VF has been considered the traditional gold standard for dysphagia screening due to its high accuracy. However, some limitations, including radiation exposure, invasiveness, equipment dependent, and relative expensive hinder VF from being widely used. On the other hand, VE as a relatively novel instrument for dysphagia assessment has been promoted since 1990s [19]. Due to the portability of VE equipment, absence of ionizing radiation, and availability of testing materials, VE is particularly valuable in various situations, such as bedridden or immobilized patients, patients in the ICU or on monitors, repeated examinations, and patients who need to avoid ionizing radiation exposure [19]. There have been numerous comparisons between VF and VE, with studies claiming that VE is also an effective instrument for dysphagia evaluations and outcomes [19, 20]. These studies demonstrated that VE is more sensitive than VF when evaluating swallowing safety, as VE had a slight advantage in detecting aspiration, penetration, and residues compared to VF [19, 20]. In a research, PAS (Penetration–Aspiration Scale), pixel-based circumscribed area ratio, and Yale Pharyngeal Residue Severity were, respectively, applied in both VF and VE, and strong positive correlations and agreement were found between VF and VE [21]. Our research also revealed that VE has a statistically significant agreement with VF for both FOIS and DSS. Based on the above, it is plausible that VE is not inferior to VF in clinical practice. A recent review deemed VE a beneficial first-line examination, and they considered that using VF after VE could be advantageous in obtaining complete visualization and ensuring that no aspiration events are overlooked during the VE procedure [22]. VF is capable of providing real-time evaluation of all four phases of swallowing; whereas, VE might not capture the oral preparatory and esophageal phases. It is noteworthy that VF holds an advantage over VE in assessing the upper esophagus [23].

The study has some limitations. First, the small sample size of the study may have affected the results, as tools other than VE did not show statistical significance. This lack of power could be due to the limited number of participants. The small sample size also led to an uneven distribution of individuals among the groups. Specifically, the severe VF-FOIS group consisted of only 4 patients, which impacts the statistical power of the study. To confirm our findings, further research with a larger sample size is necessary. Second, the recruited patients' baseline functional severity and cognitive status were not examined, despite the fact that both of these factors can affect swallowing and should be considered in future studies. Third, the assessment tools employed in our study differ in their nature and function. The Ohkuma questionnaire and EAT-10 are questionnaire-based tools, reliant on personal judgments and primarily focused on the patient's dysphagia-related life experiences. On the other hand, the RSST, DSS, and FOIS are evaluated by medical professionals, adding a more objective component to their assessments. The DSS and FOIS are designed to evaluate the severity of dysphagia based on clinical swallowing situations. Conversely, the Ohkuma questionnaire, EAT-10, and RSST are intended primarily for the screening of potential dysphagia patients, reflecting their different focus. Given that the dysphagia severity is still based on the patient's clinical swallowing status with differences in degree, the present study's comparisons overcome the restrictions and discriminations between the scales and make the results more reasonable. Last but not least, there are still many different bedside screening tools that were not included in this study. For a more comprehensive understanding, future studies could consider incorporating these additional screening tools.

Conclusion

In conclusion, the consistencies between bedside screening tools and VF are all statistically insignificant. Only VE has a statistically significant agreement with VF for assessing PSD. Due to the invasiveness, cost, and limited access of VF, it can be limited as an option for dysphagia evaluations. Therefore, VE can be considered a reliable alternative to VF, particularly when VF is unavailable or unsuitable.