Introduction

Background

Parkinson’s disease (PD) is a progressive neurodegenerative disorder that affects up to 3% of elderly adults and is projected to affect more than 9 million adults worldwide by 2030 [1]. It has been estimated that over 80% of individuals with Parkinson’s disease will experience dysphagia at some point in their disease progression [2]. Swallowing difficulties can lead to poorer quality of life, reduced ability to take medications on schedule, and increased risk for malnutrition and dehydration [3]. Aspiration pneumonia is currently the leading cause of death in the PD population, accounting for the cause of mortality in approximately 70% of individuals with the disease [4].

Estimates of dysphagia prevalence in the PD population vary widely. A prevalence of 35% has been reported based on subjective dysphagia complaints but the prevalence approaches 85% when objective dysphagia measures are used [5]. There is a clear mismatch between patient perceptions and findings from objective testing, with silent aspiration occurring in some cases within the first two years of diagnosis [6]. Collectively, these factors make clinical assessment of dysphagia and aspiration-risk in individuals with PD difficult [7]. Currently there is no well-established screening protocol for dysphagia in PD. In one study of PD patients hospitalized with pneumonia, only 13% had previously undergone a swallowing evaluation, suggesting that referral criteria are not well established [8]. A swallowing screening test is a quick and inexpensive test that can be administered by individuals without specialized expertise to determine the need for further evaluation [9]. Several protocols have been recommended, including the use of population-specific screening questionnaires [10, 11], water swallow protocols [12, 13], clinical swallowing evaluations [14], voluntary cough measures [15], and a combination of clinical criteria [2, 7, 16]. To date, none has gained widespread acceptance.

In a recent study, we investigated the predictive value of three swallow screening tools in a cohort of consecutive patients with PD with no confounding medical conditions to determine their accuracy in predicting abnormal airway protection on a videofluoroscopic swallowing study (VFSS) [17]. As described in greater detail below, the screening protocol included three different modes of assessment, each of which has been used previously as a screening tool in the PD population, namely a self-report dysphagia survey, a water swallow test, and a measure of voluntary cough strength. The survey measure used was the Swallow Disturbance Questionnaire (SDQ), a validated self-report tool specifically developed for the identification early dysphagia in individuals with PD [18]. In its original validation, the questionnaire was shown to be able to accurately detect symptoms of dysphagia in the PD population with a sensitivity and specificity of 81%. It had been previously recommended that patients with a total SDQ score of 11 or more should be referred for a comprehensive swallowing evaluation that includes objective testing. Unfortunately, we were not able to replicate this finding in our study and were also not able to find an alternate cut-off score that was more accurate. There were a number of limitations in the previous study, including the fact that airway protection was the only measure of swallowing function that was assessed. Abnormal airway protection was identified using the worst score on the Penetration–Aspiration Scale (PAS) [19] across a series of bolus trials. Although this is common in both clinical practice and research [20], there is some debate about the accuracy of this methodology [21, 22]. In addition, our previous investigation considered only airway protection and other aspects of swallowing, including swallowing efficiency, was not considered. One further limitation was that the previous investigation examined the SDQ total score but did not explore the survey’s two subdomains (i.e., Oral and Laryngopharyngeal scores) to see if either of these might be more strongly associated with objective measures. The current study was undertaken to address these questions.

The Dynamic Imaging Grade of Swallowing Toxicity (DIGEST) is a measure of global swallowing severity that uses established metrics of swallowing safety and efficiency in order to assign a global grade of pharyngeal swallow function based on VFSS findings [23]. Based on the frequency, pattern, and amount of penetrated or aspirated material a score is assigned to provide an overall safety grade. The maximum percentage of pharyngeal residue from the first swallow of each bolus consistency is used to determine an efficiency grade. The DIGEST was validated for a MBS protocol that included two trials each of 5-mL, 10-mL, and self-administered cup sip volumes of thin liquid barium, barium pudding, and a cracker coated in barium paste and is not applied to swallow attempts where compensatory strategies were trialed. Even though the DIGEST was originally developed for use in the head and neck cancer (HNC) population, a number of studies have used it to classify dysphagia in patients with neurologic disease including amyotrophic lateral sclerosis (ALS) [24, 25], anoxic brain injury [26], oculopharyngeal muscular dystrophy (OPMD) [27], and PD [28]. To date, however, the DIGEST protocol has not been formally evaluated for use with neurogenic dysphagia and one of the aims of this study was to further examine the utility of this measure for individuals with PD.

Study Aims

In this study, we sought to address the following aims:

Aim 1: To evaluate the intra- and inter-rater reliability of DIGEST scores and whether DIGEST scores were associated with clinician assessments of overall severity on VFSS.

Aim 2: To determine whether those with normal and abnormal SDQ scores differ in their safety, efficiency, or overall severity using the DIGEST, or with regard to other characteristics.

Aim 3: To examine whether the SDQ subscores (i.e., Oral and Laryngopharyngeal) or other patient characteristics were associated with DIGEST scores.

Methods

Participants

The current study is a secondary analysis of data previously collected using a standardized collection protocol in a large cohort of individuals with PD that has been described previously [17]. All procedures were approved by the Oregon Health and Science University Institutional Review Board (IRB#17516). The data collected during that study were entered into a database which was then used for the current investigation. Data were systematically collected on consecutive individuals who were referred for a swallowing evaluation for symptoms of dysphagia and met the study’s inclusion criteria from 1/1/18 to 2/28/20. The participants in that study were individuals with idiopathic PD who had no comorbid conditions that could affect swallowing (including previous radiation or surgery to the head and neck, trauma, or other neurological conditions). A self-reported dysphagia measure, the SDQ [18], was completed by each participant prior to undergoing VFSS using a standardized protocol, as described in greater detail below. All participants were in the clinical “on” state with patients being asked to take their PD medications before the examination. Demographic and disease-related variables were collected from a patient interview as well as the electronic medical record. The variables in the database included age, gender, body mass index (BMI), previous VFSS, history of pneumonia, diet status, and duration of PD. Diet status was measured using the Functional Oral Intake Scale (FOIS) [29]. PD severity was based on the Hoehn and Yahr (H&Y) scale [30]. A clinical measure of swallowing efficiency was calculated using the 100 ml Timed Water Swallow Test (TWST) [12]. Swallowing efficiency on the TWST is calculated by dividing the volume of water consumed by the amount of time to complete the task with a value of less than 10 ml/second considered abnormal [31]. A hand-held peak flow meter (TruZone; Monaghan Medical Corporation, Syracuse, New York, USA) with a measurement range of 60 to 800 L/min was used to measure volitional peak cough flow (PCF).

Procedure

The SDQ is a validated self-report tool for the identification of early dysphagia in individuals with PD [18]. The survey consists of 15 questions and asks the respondent to rate the frequency of each of item. Five items relate to the oral phase of swallowing (e.g., “Do you experience difficulty chewing solid food, like an apple, cookie or a cracker?”) and ten items relate to the pharyngeal phase (e.g., “Do you cough while swallowing liquids?”). In its original validation study, the questionnaire was reported to be sensitive and specific for the detection of symptoms of dysphagia in this patient population with a sensitivity and specificity of 81%. The authors recommended that PD patients with a total SDQ score of 11 or more should be referred for a comprehensive swallowing evaluation that includes objective testing. Consequently, a score of 11 on the SDQ was used in the current study as a cut-off value to differentiate those with “normal” and “abnormal” scores. The SDQ also includes two subscales: an Oral score can be calculated by summing the first five items and a Laryngopharyngeal score by summing the remaining ten items [11].

Each participant had undergone VFSS using a standardized protocol. During the VFSS, images of the oropharynx and cervical esophagus were recorded in the lateral plane, while the participant swallowed standardized consistencies of Varibar© barium contrast in the following sequence: honey (5 mL), nectar-thick liquid (5 ml, 10 ml, 20 ml), thin liquid (5 ml, 10 ml, 20 ml), pudding (5 ml), and ¼ of a graham cracker with pudding barium (3 ml). The 5 ml boluses were presented via teaspoon, and the 10 ml and 20 ml liquid boluses were presented in a medicine cup. For the 20 ml liquid boluses, participants were instructed to, “Drink the liquid until it is gone,” which resulted in either sequential swallows or one large swallow. For all other bolus sizes and consistencies, participants were cued to orally contain and then swallow the entire bolus when cued. Images were captured using a continuous image and recorded at rate of 30 frames per second. Video recordings were stored digitally to a picture archiving system (IMPAX), and then downloaded for subsequent analysis.

Sample Selection

As there is no normative data for the SDQ, data from our previous analysis were used to perform a power analysis [17]. With 80% power and a p value of 0.10, a sample of 20 patients was required for comparison. A number of authors have advocated for the use of higher p values in preliminary investigations to reduce the risk of a type 2 error [32]. A total of 47 eligible individuals had undergone the VFSS protocol during the initial study period. Upon review of the VFSS recordings, 13 studies were found to be incomplete or not suitable for scoring because imaging for at least one of the nine boluses administered had not been recorded or archived (n = 11), at least one of the boluses could not be scored using the DIGEST because the SLP had used a compensatory maneuver for that bolus (n = 1), or the study had inadequate visualization for scoring due to patient dyskinesia (n = 1). The remaining 34 studies were then subdivided into two groups based using the recommended cut-off score of 11 on the SDQ. There were 15 individuals with SDQ scores in the normal range (i.e., below 11) and 19 individuals with abnormal SDQ scores (i.e., 11 or more). Women made up a minority of the study sample (24%). In order to control for gender effects, 8 males and 2 females were randomly selected from each group to comprise the final sample of 20 participants. The selection of the study sample is outlined in Fig. 1.

Fig. 1
figure 1

Flowsheet outlining the selection of the study sample

Scoring the VFSS Studies Using the DIGEST

The DIGEST is a measure of global swallowing severity that uses subscores of swallowing safety and efficiency in order to arrive upon a global grade of swallow function [23]. The DIGEST was originally developed using the National Cancer Institute’s Common Terminology Criteria for Adverse Events for use in the HNC population. A panel of 9 expert clinician-scientists reviewed 100 VFSS that were completed using a standardized protocol. They assigned severity grades for both constructs of pharyngeal dysphagia (i.e., Safety and Efficiency) as well as an overall Total score for each VFSS. All three grades are scored a using a number from 0 to 4 which can be interpreted as follows: 0, no impairment; 1, mild; 2, moderate; 3, severe; and 4, life-threatening. Each bolus administered during the study is first scored using the PAS, originally developed by Rosenbek and colleagues [33]. If a PAS score of 5 or more is assigned, DIGEST safety modifiers of pattern and amount are then applied. The amount of penetrated/aspirated material must be rated as “trace” (i.e., “faint coating, droplets, or trickle of barium on/below TVF”), “neither trace nor gross,” or “gross” (i.e., “ > 25% bolus volume”) using the DIGEST guidelines. The frequency, pattern, and amount of penetrated/aspirated material is then used to determine the Safety grade. In addition, each bolus is scored for percentage of pharyngeal residue after the initial swallow attempt. The amount of residue is scored on a four-point scale (“ < 10%, minimal to no residue,” “10–49%, less than half residue,” “50–90%, majority residue,” or “ > 90%, near complete residue.”) The bolus consistencies with the maximum amount of pharyngeal residue are used to determine the Efficiency grade. After both the Safety and Efficiency grade have been determined, the Total grade can be determined using the interaction of these two subgrades. More recently, the scoring for the DIGEST was amended to address concerns about the definition of mild safety impairment [34]. The DIGEST-v2 was used for the current study. As described above, the standardized DIGEST protocol includes two trials each of 5 mL, 10 mL, and self-administered cup sip volumes of thin liquid barium, barium pudding, and a cracker coated in barium paste for a total of ten bolus trials [23]. In contrast, our protocol included fewer thin liquid trials but a greater range of textures, including both honey and nectar-thick liquids for a total of five consistencies rather than three and only measured boluses for a total of nine bolus trials. After consultation with one of the developers of the DIGEST protocol for guidance, we adjusted the rules of the DIGEST protocol in the following manner: (a) where penetration or aspiration of more than 50% of thin liquid trials was required for the classification “chronic” our protocol was based on 2 or more out of 3 trials rather than more than 4 or more out of 6 trials; (b) both nectar and thin liquids were defined as a “liquid” and these were considered a single “consistency” for both the safety and efficiency scores when findings were scored for one or more consistencies.

Two licensed, certified, speech-language pathologists (SLPs) with a specialty interest in dysphagia and more than 10 years of clinical experience (D.J.G. and R.K.B.) rated the VFSS studies using the DIGEST scoring form. At the time that the study was conducted, DIGEST trainings had not been developed. The training for the study included reading the original validation article for the study [23], attending a webinar about the DIGEST offered by M.D. Anderson, and participating in a series of training sessions where VFSS recordings were reviewed and the protocol applied in order to reach consensus. This was undertaken as part of an ongoing quality improvement process at our facility to use validated assessment instruments in clinical practice. The scoring guidelines for the DIGEST-v2 had not been published at the time of analysis and these guidelines were learned from a presentation at the Dysphagia Research Society [35]. The two raters were blinded to the identity of each participant and viewed the studies in a random order. Both raters were randomly assigned 10% of the sample (i.e., two studies) to score twice to allow calculation of intrarater reliability. After viewing each study in its entirety, each rater was also asked to assign an overall severity score using a 100-mm visual analog scale (VAS). The use of VAS a measurement technique is well validated and been widely used in clinical practice and research in the behavioral and social sciences including the measurement of voice- and swallowing-related characteristics [36,37,38,39]. Each rater marked an “X” on a 100-mm VAS indicating how they would rate that individual’s swallowing function overall, from “Not impaired at all” to “The most severe impairment possible.” Using the scores provided by the two raters, two other members of the study team (“coders”) independently used the DIGEST protocol to generate a Safety, Efficiency, and Total grade for each participant. Both coders were licensed SLPs (A.D.P. and M.N.) and had received the same training described above for the two raters. Both coders were also randomly assigned 10% of the sample (i.e., two studies) to score twice to allow calculation of intrarater reliability for coding.

Statistical Analysis

Statistical analysis was performed using SPSS, version 25 (IBM, Armonk, New York, 2017). Interrater and intrarater reliability was reported as a percentage of agreement and as a weighted kappa statistic and Kendall’s tau [40,41,42]. Weighted kappa was calculated using the IBM SPSS (2015) extension software and the strength of the agreement was interpreted using Landis and Koch’s guidelines [43], where 0.21–0.40 indicates fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement, and 0.81–1.00 almost perfect agreement. Although the DIGEST is an ordinal scale, previous research has reported mean scores for the measure [44]. After calculation of interrater and intrarater reliability statistics, the two DIGEST scores were averaged to create a single score for each participant for subsequent analyses. Averaging the two scores resulted in a single ordinal score ranging from 0 to 4 in 0.5 point increments for each of the DIGEST subscales. Participants were divided into two groups based on their SDQ score (i.e., under 11 = “normal”; 11 or more = “abnormal”) according to the guidelines for that instrument [18]. Between-group comparisons were made using an independent samples t test for continuous variables, the median test for ordinal variables, and Fisher’s exact test for binary variables. Correlations were calculated using Pearson correlations for continuous variables and Spearman correlations for ordinal variables. Given concerns about combining ordinal scores from multiple raters [45], in order to confirm the findings from the combined score from the two raters, the correlations were also compared from that of each rater in isolation. An alpha value of p < 0.10 was used to reduce the risk of a type 2 error. All tests were two-tailed.

Results

Aim 1: Intra- and Inter-Rater Reliability for the DIGEST and Association with Severity Scores

The 20 VFSS videos were scored independently by two raters. The two raters scored each of the nine boluses using the 8-point PAS scale and the DIGEST’s 4-point residue scale. With regard to PAS scores, the two raters showed complete agreement for the majority of the 180 VFSS clips (149/180, 83%). There were disagreements of a single point on the PAS for 24 clips (13%) and more than a single point for 7 clips (4%), as shown in Table 1. There was substantial agreement between the two raters for PAS ratings, weighted κ = 0.739 (95% CI, 0.628 to 0.850), p < 0.001, Kendall’s tau = 0.730, p < 0.001. Residue ratings also had high levels of agreement with complete agreement for the majority of the 180 clips (157/180, 87%). There were disagreements of a single point on the residue scale for 22 clips (12%) and more than a single point for 1 clip (0.5%), as shown in Table 2. Once again there was substantial agreement between the two raters for residue ratings, weighted κ = 0.769 (95% CI, 0.689 to 0.849), p < 0.001, Kendall’s tau = 0.836, p < 0.001. Both raters had been randomly assigned 10% of the sample (i.e., two studies) to score twice. Both raters both showed perfect intrarater agreement for all 18 of their PAS scores across the two studies, weighted κ = 1.000 (95% CI, 1.000 to 1.000), p < 0.001. For the residue ratings, Rater 1 demonstrated almost perfect agreement for 17/18 (94%) of the ratings, weighted κ = 0.931 (95% CI, 0.803 to 1.060), p < 0.001, and Rater 2 demonstrated perfect agreement, weighted κ = 1.000 (95% CI, 1.000 to 1.000), p < 0.001.

Table 1 Comparison of PAS scores by rater
Table 2 Comparison of residue scores by rater

The two raters each assigned a global severity score for each study using a 100-mm VAS. There was a strong, significant correlation between VAS ratings between the two raters, r = 0.890, p < 0.001.

After the PAS and residue ratings were generated, two coders then independently used the PAS and residue ratings to assign DIGEST scores based on that instrument’s guidelines. Interrater reliability was calculated for the generation of the Safety, Efficiency, and Total DIGEST scores. There was perfect agreement between the two coders for all of the DIGEST-Safety (40/40, 100%), DIGEST-Efficiency (40/40, 100%), and DIGEST-Total scores (40/40, 100%), weighted κ = 1.000 (95% CI, 1.000 to 1.000), p < 0.001, Kendall’s tau = 1.000, p < 0.001. Intrarater reliability was also calculated for two randomly selected repeated studies. Intrarater reliability for both coders was also perfect (40/40, 100%), weighted κ = 1.000 (95% CI, 1.000 to 1.000), p < 0.001.

As there had been discrepancies between the two original raters for the PAS and residue ratings, this also caused differences in the DIGEST scores for some individuals. For the DIGEST-Safety scale, there was perfect agreement for 14/20 (70%) of the ratings, a difference of 1 point for 5/20 (25%), and one case of a difference of more than 1 point (5%), as shown in Table 3. For the DIGEST-Efficiency scale, there was perfect agreement for 18/20 (90%) of the ratings and a difference of 1 point for 2/20 (10%), as shown in Table 4. For the DIGEST-Total scale, there was perfect agreement for 16/20 (80%) of the ratings and a difference of 1 point for 4/20 (20%), as shown in Table 5.

Table 3 Comparison of DIGEST-Safety scores by rater
Table 4 Comparison of DIGEST-Efficiency scores by rater
Table 5 Comparison of DIGEST-Total scores by rater

Aim: 2 Comparison of Individuals with Normal and Abnormal SDQ Scores

The background data for the 20 participants are presented in Table 6. The study sample was predominantly male (80%) with an average age of 71.05 years (± 10.87 years), and an average PD duration of 9.1 years (± 5.22 years). Most were eating an unrestricted diet (75%), had not previously had a VFSS (80%), and had no recent history of pneumonia (90%). There were no significant differences in background characteristics of the two groups, with one exception: those with a normal SDQ had a significantly higher mean BMI (29.75 ± 4.88) than those with an abnormal SDQ (25.26 ± 3.14, p = 0.025).

Table 6 Demographic and clinical characteristics of all participants and compared by group

For the subsequent analyses, the DIGEST scores from the two raters were averaged so that each of the 20 participants had a single score ranging from 0 to 4 for each of the three DIGEST subscales, as shown in Table 7. The VAS scores from both raters were also averaged to create a single score for each participant. As shown in Table 8, the median score for each of the three DIGEST subscales did not differ significantly by group when compared using a median test. The prevalence of abnormal findings was compared between the two groups in a binary fashion with a score of 0 for each DIGEST subscale being considered “normal” and any other score being “abnormal.” As shown in Table 8, abnormal findings occurred commonly in both groups and did not differ significantly. The VAS ratings for both groups were also compared. Those with an abnormal SDQ had a higher mean VAS rating than those with a normal SDQ score (23.20 ± 26.44 vs. 14.05 ± 12.59, respectively) but this did not differ significantly.

Table 7 Mean DIGEST scores for all participants and compared by group
Table 8 Median DIGEST scores for all participants and comparison of abnormal scores by group

Aim 3: Association Between Patient Characteristics, Study Scales, and DIGEST Scores

Spearman correlations were used to examine the association between the ratings on the DIGEST and the characteristics of the study participants as well as their scores on the SDQ and the VAS. As shown in Table 9, three of the study variables were significantly associated with DIGEST scores. Older age was moderately associated with worse DIGEST-Safety scores (rs = 0.453, p = 0.045). On the FOIS, a less restricted diet was moderately associated with better DIGEST-Efficiency scores (rs =  − 0.474, p = 0.035) and better DIGEST-Total scores (rs =  − 0.433, p = 0.057). Only one of the SDQ scores was significantly associated with any of the DIGEST ratings. Higher Total scores on the SDQ were associated with worse DIGEST-Efficiency scores (rs = 0.388, p = 0.091). In contrast, the global VAS ratings from the VFSS showed statistically significant associations with all three of the DIGEST scales. There were strong, positive associations between global VAS ratings and DIGEST-Safety (rs = 0.793, p < 0.001), DIGEST-Efficiency (rs = 0.711, p < 0.001), and DIGEST-Total (rs = 0.808, p < 0.001) scores. These associations were confirmed for each of the two raters individually and therefore did not seem to be an artifact from averaging the DIGEST scores for the participants.

Table 9 Correlation between patient characteristics and study scales with DIGEST scores

Discussion

One of the aims of this study was to examine the validity of the DIGEST as a global measure in the PD population. Even though the DIGEST was originally developed for use in the HNC population, the metrics that it uses to generate a summary score are widely used across populations, suggesting that the DIGEST may have broader applicability. The frequency, pattern, and amount of penetrated or aspirated material are calculated to determine an overall safety grade using the PAS [19], a measure that is in widespread clinical and research use for adults with dysphagia of varying etiologies [20, 22]. The maximum percentage of pharyngeal residue and bolus consistency is used to determine an efficiency grade and a number of similar ordinal scales exist for rating pharyngeal residue in a similar fashion [46]. To date, several studies have used the DIGEST to classify dysphagia in patients with neurologic disease. Using a cut-off score of 1 to represent abnormal swallowing, previous studies have used the DIGEST to estimate the prevalence of dysphagia in cohorts of patients with OPMD [27] and also to test the predictive value of swallowing-related items on the ALS Functional Rating Scale-Revised [24]. Two interventional case studies have reported the impact of a respiratory-retraining program for an individual with anoxic brain injury [26] and PD [28] and used the DIGEST to score fiberoptic endoscopic evaluation of swallowing (FEES) studies performed before and after treatment. Improvements in DIGEST scores mirrored other improvements in swallowing, respiration, and cough-related measures [26, 28]. Plowman and colleagues [25] used the DIGEST as an outcome measure in a randomized controlled trial of expiratory muscle strength training in 48 individuals with ALS. There was a significant improvement in the proportion of individuals with abnormal Total and Efficiency scores on the DIGEST but not on the Safety subscale. These improvements were mirrored by a significant improvement on the FOIS, although no significant difference in Eating Assessment Tool scores was found. The current study adds to this body of literature and shows additional preliminary support for the use of the DIGEST in other populations. In terms of background characteristics of the participants, age was associated with significantly reduced safety scores and there was also a significant association between efficiency scores and diet. These findings are consistent with previous research [17, 25]. The raters’ VAS scores of overall dysphagia severity on the VFSS were significantly associated with all three DIGEST scores. These findings provide additional support for the DIGEST as a potentially valid measure for individuals with PD.

In this study, we applied the DIGEST protocol to rate boluses that had been administered according to a previously described standardized VFSS protocol [17]. It is important to note that we did not utilize the DIGEST protocol as outlined in the original validation study. The standardized DIGEST protocol includes two trials each of 5 mL, 10 mL, and self-administered cup sip volumes of thin liquid barium, barium pudding, and a cracker coated in barium paste [23]. In contrast, our protocol included fewer thin liquid trials but a greater range of textures, including both honey and nectar-thick liquids, and only measured boluses. According to the developers, the protocol should ideally include 5–6 trials of a thin liquid bolus, and work on validating other bolus protocols is a future topic for investigation [34]. It is unclear how much DIGEST scores might be affected by the use of a different protocol and whether findings can be directly compared across studies, if different protocols are used [23]. Our rationale for using the DIGEST in this study was to generate summary scores for safety, efficiency, and overall severity and found the DIGEST to be well suited to this purpose. The authors have stated that deviation from the protocol is expected in “extreme cases” and that the DIGEST scores are robust even if there are trials are skipped or compensations are introduced early in the study for safety reasons [34] but this remains for further investigation.

The other aims for this study were to examine the predictive value of the SDQ as a screening tool. There is a considerable body of literature on dysphagia screening and the importance of screening is widely agreed [47, 48]. “Gold standard” instrumental exams are not only expensive but also not always readily available, which may delay preventative measures from being implemented [9]. In the acute stroke population, the value of dysphagia screening has been associated with reduced pneumonia rates, shortened patient length of stay in the hospital, and reduced hospital costs [49,50,51]. Even in the acute stroke population, however, issues with the accuracy of dysphagia screeners caused the Joint Commission to retire dysphagia screening as a performance indicator for Primary Stroke Center certification in 2010 [52]. There is a similar need for outpatient dysphagia screening protocols in populations known to be at-risk for the negative consequences of dysphagia, such as those with PD. Accurate screening protocols would allow healthcare providers to make appropriate and timely referrals for evaluation to prevent costly and potentially life-threatening complications. Despite considerable research, the optimal screening protocol for outpatient neurogenic populations remains to be defined. There are a large number of general and population-specific dysphagia surveys that have been validated in the literature [53]. Surveys and patient-report tools are a desirable method of screening as they are low cost, easy to administer, and (by definition) focus on the concerns of the individual patient.

At our institution, we have used the SDQ as a screener for PD patients but have questioned its accuracy. In a recent investigation, we compared the SDQ to two other screening tools in a cohort of PD patients seen at our institution and found that none of the three was predictive of reduced airway protection on VFSS [17]. The previous study had a number of limitations, however, including the fact that airway protection was the only measure of swallowing function that was assessed. By comparing more global measures of safety, efficiency, and overall swallowing impairment in a sub-selected cohort and by using the two subscales of the SDQ, we hoped to find that the SDQ was more predictive of objective swallowing status. Unfortunately, we found no significant group differences for any of the DIGEST scores when individuals were grouped by the SDQ cut-off score of 11. When we examined the SDQ scores in a continuous fashion, there was only a correlation between the SDQ total and one of the VFSS measures. Higher SDQ scores were associated with worse swallowing efficiency. As such, it would appear that the SDQ is more predictive of abnormal swallowing strength on VFSS rather than swallowing safety. In further examining correlations between the subscales, the total SDQ score had a weak correlation with PAS scores regardless of whether the worst PAS score across all trials (rs = 0.172, p = 0.044) or the average PAS score from all boluses (rs = 0.214, p = 0.012) was used. In contrast, the SDQ was more strongly associated with efficiency measures such as diet scores on the FOIS (rs =  − 388, p < 0.001) and swallowing efficiency on the TWST (r =  − 351, p < 0.001). Given the concerns about swallowing safety in the PD population, the accuracy of the SDQ in fulfilling its originally stated purpose of identifying those in need of an objective swallowing evaluation remains in doubt. This finding is consistent with other studies that have demonstrated poor associations between subjective complaints and objective dysphagia characteristics in this population [5, 10, 14, 54].

It is unclear why our findings differed from those of the original validation study for the SDQ. Our sample was similar in terms of age, gender, and disease severity to that of the original sample and actually had a longer disease duration (9.1 vs. 6.7 years) [18]. One key difference is that the original validation study for the SDQ used clinical swallowing evaluations and FEES to assess dysphagia rather than VFSS. The only significant between-group difference in our study was BMI. This suggests that the SDQ may be sensitive to self-perceptions of mealtime difficulties and reduced eating efficiency that may place the individual at-risk for inadequate nutrition and may be most appropriate as a screening measure for this purpose. It does appear, however, that screening measures based on self-report cannot be considered trustworthy for predicting abnormal airway protection on VFSS in the PD population. It has been argued that VFSS itself may not be sufficiently sensitive for identifying early changes in swallowing function in individuals with PD. Jones and Ciucci [55] compared and found that high-resolution manometry (HRM) and a swallowing questionnaire item relating to problems with saliva were better than VFSS in differentiating individuals with PD from controls. Like ours, their study included individuals with early and mid-stage PD and their findings also suggest that changes in swallowing pressure and efficiency are most noticeable at this stage in the disease’s development. One alternative explanation for the lack of association between patient-report measures and objective measures of swallowing function is that the objective measures themselves may not be sufficiently sensitive to subtle changes in swallowing physiology that occur early in the disease process. The structured nature of tasks on VFSS including a bolus hold and a cued swallow is well known to affect the timing of swallowing physiology [56, 57], and this may be particularly facilitative for individuals with PD [58]. In addition, the mealtime difficulties that many individuals experience are more than those that are just related to swallowing [59] and it is possible that survey measures may more accurately capture some of these global difficulties. At our institution, therefore, we have changed how we utilize the SDQ. Higher scores on the SDQ indicate that an individual is experiencing dysphagia symptoms that need to be assessed but not necessarily with an objective evaluation. A clinical swallow evaluation is used to better assess the nature of the problem and determine whether strategies are effective. If not, an objective evaluation using either VFSS or FEES is then performed.

Limitations

This study has a number of limitations that should be acknowledged. For this secondary analysis from the larger dataset, we analyzed the data from a small number of randomly selected individuals. Most of the participants demonstrated mild dysphagia as demonstrated by both VAS and DIGEST scores and this may have affected the sensitivity of our findings to those with more severe deficits. Our study did not utilize the original VFSS protocol outlined in the original DIGEST validation study, and as such, the scores reported here may not be directly comparable to those that used this protocol particularly with regard to the safety subscale. Future work should examine the clinical and research utility of the DIGEST in larger and more diverse samples of participants. In addition, the consideration of VFSS as compared to other objective methods of swallowing assessment including FEES and HRM which might be more sensitive to early changes and whether more subtle changes are accurately captured in patient questionnaires are questions for future investigation.

Conclusion

The results of this study demonstrate that the SDQ did not accurately predict abnormal swallowing function on VFSS in a cohort of 20 adults with PD. Higher SDQ scores were associated with poorer swallowing efficiency but not swallowing safety or overall abnormality. This finding is consistent with previous research that has shown screening methods based on self-report to be poorly predictive of VFSS findings in the PD population. The DIGEST may be a useful tool to characterize global swallowing function in PD for research and clinical practice. In addition, the optimal objective method of screening and assessment in the PD population, particularly those in the early stages of the disease, remains to be defined.