Introduction

Identification of pharyngeal residue and its severity have been primary goals of the fiberoptic endoscopic evaluation of swallowing (FEES) since the procedure’s initial description [1]. Sensitivity and specificity for determination of presence or absence of pharyngeal residue during FEES were good [2], and the importance of this key component has been corroborated by subsequent state-of-the-art reports [35]. Over the past three decades, FEES has become a widely used objective instrumental examination to diagnose pharyngeal phase dysphagia, implement therapeutic interventions, and make recommendations for safe oral alimentation [610]. However, a reliable, validated, and generalizable pharyngeal residue severity rating scale for FEES has been lacking.

Pharyngeal residue is a clinical sign of potential prandial aspiration [11]. An accurate description of pharyngeal residue severity is, therefore, an important but difficult clinical challenge [12]. Many studies have reported findings of pharyngeal residue during FEES but no attempt was made to determine pharyngeal residue severity patterns [6, 8, 1343]. Simply stating that vallecula and pyriform sinus residue occurred is not helpful for either clinical or research purposes as patient information cannot be shared and efficacy of treatment interventions cannot be determined. The absence of a reliable, validated, and generalizable scale to determine vallecula and pyriform sinus residue severity patterns has made it difficult for clinicians to share patient information and determine benefits of therapeutic interventions.

The purpose of this systematic review is to evaluate the published literature since 1995 that investigated pharyngeal residue severity rating scales based on FEES. The research question this study addresses is: Do the qualitative and psychometric properties of published scales meet the criteria necessary for reliable, valid, and generalizable determination of vallecula and pyriform sinus pharyngeal residue severity?

Methods

Search Methodology

The following databases were searched for relevant studies: MEDLINE (OvidSP 1946April, Week 3, 2015), Embase (OvidSP 19742015 April 20), Scopus (Elsevier), and the unindexed material in PubMed (NLM/NIH). All searches were conducted on April 20, 2015 except for Scopus, which was conducted on April 23, 2015. Supplementary efforts to identify studies included checking the reference lists of the articles retrieved.

The databases were searched using both controlled vocabulary words and synonymous free text words for the topic of interest (deglutition disorders, pharyngeal residue, endoscopy, videofluoroscopy, fiber optic technology, food, aspirate, etc.) and the outcomes of interest (scores, scales, grades, FEES, tests etc.). The search strategies were adjusted for the syntax appropriate for each database/platform. The search was limited to articles published since 1995 and written in the English language. See “Appendix” for MEDLINE search strategy.

Inclusion Criteria

This systematic review focused solely on studies that reported on completed and generalizable pharyngeal residue severity rating scales based on FEES. Scales limited to a specific disease process or diagnosis were not included.

Study Selection

Titles and abstracts of the retrieved articles were independently evaluated by two reviewers (PDN and SBL). Abstracts that did not provide adequate information regarding inclusion criteria were retrieved for full-text evaluation. The reviewers independently evaluated full-text articles and determined study eligibility. Disagreements were resolved by consensus agreement.

Data Extraction

The same two reviewers independently conducted study selection and data extraction. General qualitative characteristics of the studies collected included prospective or retrospective design, year of publication, severity definitions, scale type (binary, ordinal, or estimation), number of raters, experience of raters, and number of images rated. Pyschometric properties collected were test/retest times, randomization of images, intra- and inter-rater reliability, and validity statistics.

Data Analysis

A qualitative summary composed of descriptive characteristics and psychometric properties of the scales used to evaluate pharyngeal residue severity ratings based on FEES was created for each included study (Table 1). Categories included were study design, sample size, severity definitions, scale type, number and experience of raters, randomization of images, and intra- and inter-rater reliability and construct validity.

Table 1 Qualitative summary of the 7 pharyngeal residue severity rating scales developed for general use based on FEES

Operational Definitions

  1. 1.

    Pharyngeal residue was operationally defined as pre-swallow secretions and post-swallow food residue in the vallecula and pyriform sinuses not entirely cleared by a swallow [44].

  2. 2.

    The operational definition of a pharyngeal residue severity rating scale was reliable and valid ratings of pharyngeal residue severity patterns; not to determine why residue occurs or ascertain the timing of residue occurrence during swallowing.

  3. 3.

    Scale types were operationally defined as binary (presence/absence of residue), ordinal (to capture progressively increasing amounts of residue), and estimation (amount of observed residue as an estimate of the percentage of the original bolus).

  4. 4.

    The valleculae were anatomically defined as the spaces between the base of tongue and epiglottis [45].

  5. 5.

    The pyriform sinuses were anatomically defined as the spaces formed on both sides of the pharynx between the fibers of the inferior pharyngeal constrictor muscle and the sides of the thyroid cartilage, and lined by orthogonally directed fibers of the palatopharyngeus muscle and pharyngobasilar fascia [45].

Results

The initial search retrieved 4388 potentially relevant citations. A total of 2215 duplicates were excluded. The resulting 2173 titles and abstracts were manually reviewed and an additional 2037 excluded. Review of the references in the full texts of the remaining 136 articles revealed 2 new citations. This brought the total number of articles included for eligibility assessment to 138 and after full text reviews a total of 7 studies specific to pharyngeal residue severity rating scales based on FEES were identified for inclusion in the qualitative analysis (Fig. 1).

Fig. 1
figure 1

Flow chart of citations included in the systematic review illustrating the process through which relevant data were retrieved

Pharyngeal Residue Severity Rating Scales Based on FEES

A qualitative summary of the descriptive characteristics and psychometric properties of the 7 pharyngeal residue severity rating scales based on FEES indicated major design flaws that precluded reliable, valid, and generalizable use of 6 scales [9, 11, 4649]. These deficiencies included inadequate number of raters, no reporting of raters’ years of experience or training on scale, non-randomization of images, and missing statistical analyses of inter- and intra-rater reliability and construct validity. Only the Yale Pharyngeal Residue Severity Rating Scale [44], an anatomically defined and image-based tool, met all criteria necessary for a valid, reliable, and generalizable vallecula and pyriform sinus residue severity rating scale based on FEES (Table 1). Below are synopses of each of the reviewed scales.

Accumulated Oropharyngeal Secretions

Murray et al. [11] performed a retrospective binary analysis of pharyngeal residue severity based on 69 FEES videos. A gross estimation of volume of secretions in the valleculae and pyriform sinuses was made by two expert raters without specific training in use of the scale. Years of experience for the raters were not reported, videos were not randomized, and no test/retest reliability or construct validity was performed.

Marianjoy 5-Point and 3-Point Secretion Severity Scales

Donzelli et al. [46] performed a prospective estimation analysis of pharyngeal residue severity based on 104 FEES videos and used a 5-point estimation scale of vallecula and pyriform sinus severity (normal <10 % filled, mild 1025 % filled, moderate >25 % filled, severe has laryngeal penetration, profound has aspiration) and a reduced 3-point scale (functional 025 % filled, severe has laryngeal penetration, profound has aspiration). Two expert raters with unknown years of experience and no specific training in use of the scales participated. Videos were not randomized and only inter-rater reliability reported. A high correlation was reported for the 5-point and 3-point scales. No construct validity was reported for either the 5-point or 3-point scale.

Perception of Pharyngeal Residue Severity

Kelly et al. [9] performed a prospective ordinal analysis of pharyngeal residue severity based on 15 still FEES images and used a 5-point ordinal scale (none, coating, mild, moderate, and severe). Definitions of severity were not provided. A total of 15 expert raters with unknown years of experience and no training in use of the scale participated. Videos were randomized and re-rated 1 week later by all raters. Intra-rater reliability was good, i.e., 0.72, while inter-rater reliability was moderate, i.e., 0.51. No construct validity was reported.

Pooling Score

Farnetti [47] performed a retrospective ordinal analysis of pharyngeal residue severity using 520 FEES videos and used a 3-point ordinal scale (coated, minimally filled, and entirely filled). Definitions of severity were not provided. The authors did not report if all videos were analyzed. Neither the number of expert raters nor whether they were trained on use of the scale was reported. Videos were not randomized and no intra- or inter-rater reliability or construct validity was reported.

Inter- and Intra-Rater Reliability with FEES

Tohara et al. [48] performed a retrospective analysis of vallecula and pyriform sinus residue severity based on 10 FEES videos chosen by a single expert with unreported years of experience and used a 3-point ordinal scale (trace, small, large). Definitions of severity were not provided. There were 9 expert raters with a mean of 5.4 (±1.9) years of experience and no training on use of the scale. Videos were randomized and re-rated four times at one-week intervals. Overall intra-rater reliability ranged from 0.53 ± 0.04 to 0.78 ± 0.03 and inter-rater reliability ranged from 0.35 ± 0.04 to 0.46 ± 0.04. No construct validity was reported.

Detection Rates of Pharyngeal Residue

Park et al. [49] performed a retrospective binary estimation of vallecula and pyriform sinus residue severity, i.e., >15 % filled or not, based on 50 FEES videos. There was only a single expert rater with 7 years of experience but with no training on use of the scale. No randomization of images, intra- or inter-rater reliability, or construct validity was reported.

The Yale Pharyngeal Residue Severity Rating Scale

Neubauer et al. [44] performed a retrospective, ordinal, anatomically defined, and image-based analysis of vallecula and pyriform sinus severity rating patterns based on 25 still images from FEES which corresponded to a 5-point ordinal scale (none, trace, mild, moderate, and severe) (See Tables 2, 3 for definitions of vallecula and pyriform sinus residue severity patterns, respectively.) Specifically, a total of 261 FEES evaluations were reviewed, 101 images were selected, and consensus agreement between two expert judges with a combined 26 years of performing and interpreting FEES allowed for selection of 25 potential final images, i.e., a no residue exemplar and three exemplars each of trace, mild, moderate, and severe vallecula and pyriform sinus residue. Hard-copy color images of the no residue, 12 vallecula, and 12 pyriform sinus images were randomized by residue location for hierarchical categorization by 20 raters trained at 18 different institutions from around the world, i.e., otolaryngology residents (n = 11), attending otolaryngologists (n = 5), speech-language pathologists (n = 3), and physician assistant (n = 1). The raters had different durations of experience in performing and interpreting FEES evaluations (mean 8.3 years, range 2–27 years). Raters were grouped by years of FEES experience and training status. Years of experience indicated that 10 raters had ≤4 years (mean 2.8 years, range 2–4 years) and 10 raters had ≥5 years (mean 13.4 years, range 5–27 years). Training was done once, with random assignment of 10 raters to receive and 10 raters not to receive pre-rating training in determining vallecula and pyriform sinus pharyngeal residue severity ratings. Training included written definitions, visual depictions, verbal explanations, and clarifying questions/answers of the severity ratings. No training was limited to only written definitions and visual depictions of the severity ratings. Intra-rater test–retest reliability, inter-rater reliability, and construct validity for severity ratings for all images were performed by the same two expert judges and 20 raters, two weeks apart, and with the order of image presentations randomized. Analyses were done separately for vallecula and pyriform sinus locations. Residue ratings were excellent for intra-rater reliability for vallecula (κ = 0.957 ± 0.014) and pyriform sinus (κ = 0.854 ± 0.021); very good to excellent for inter-rater reliability for vallecula (κ = 0.868 ± 0.011) and pyriform sinus (κ = 0.751 ± 0.011); and excellent for validity for vallecula (κ = 0.951 ± 0.014) and pyriform sinus (κ = 0.908 ± 0.017) locations. More years of experience did not result in higher κ values for either vallecula (p = 0.20) or pyriform sinus (p = 0.23) residue ratings. Training did not result in higher κ values for either vallecula (p = 0.17) or pyriform sinus (p = 0.55) residue ratings.

Table 2 Definitions for severity of valleculla residue [44]
Table 3 Definitions for severity of pyriform sinus residue [44]

Discussion

This is the first report to systematically review the qualitative and psychometric properties of pharyngeal residue severity rating scales based on FEES. A summary of the qualitative characteristics and psychometric properties of the 7 pharyngeal residue severity rating scales based on FEES found methodological deficiencies that precluded reliable, valid, and generalizable use of 6 scales [9, 11, 4649]. These deficiencies included lack of clear definitions, inadequate sample size and number of raters, no reporting of raters’ years of experience or training on scale, non-randomization of images, and missing statistical analyses of inter- and intra-rater reliability and construct validity.

Only the Yale Pharyngeal Residue Severity Rating Scale [44], an anatomically defined and image-based tool, met all criteria necessary for a valid, reliable, and generalizable vallecula and pyriform sinus residue severity rating scale based on FEES. For the first time, clinicians can accurately and reliably classify vallecula and pyriform sinus residue severity patterns using high-quality full-color images that correspond precisely with the severity ratings of none, trace, mild, moderate, and severe (Tables 2, 3). Severity ratings of trace and mild are indicative of lower aspiration risk, while severity ratings of moderate and severe are indicative of higher aspiration risk. This knowledge allows the clinician to decide if oral alimentation can continue and if therapeutic interventions should be started. Additionally, since no differences were found based on years of experience or prior training on use of the Yale Pharyngeal Residue Severity Rating Scale [44], proficiency is readily achievable in a short period of time by clinicians from different specialty areas and with different levels of expertise.

The generalizability and versatility of the Yale Pharyngeal Residue Severity Rating Scale [44] allows for both clinical advantages and research opportunities. Clinical uses include accurate diagnosis of vallecula and pyriform sinus residue severity, assessment of functional therapeutic change, and precise dissemination of information among clinicians. Research uses include tracking the progress of outcome measures for targeted swallowing interventions, supporting efficacy of specific interventions to reduce pharyngeal residue, longitudinal determination of morbidity and mortality associated with pharyngeal residue severity patterns in different patient populations, and improving the training and accuracy of FEES interpretation by students and clinicians.

Conclusion

This systematic review investigated the qualitative and psychometric properties of pharyngeal residue severity rating scales based on FEES. There is a need for a qualitative and psychometrically reliable, validated, and generalizable pharyngeal residue severity rating scale that is anatomically specific, image-based, and easily learned by both novice and experienced clinicians. A total of 7 reports were identified but 6 were of poor quality and failed to employ adequate qualitative and psychometric methods necessary for a robust pharyngeal residue severity rating scale. Only the Yale Pharyngeal Residue Severity Rating Scale [44] met all criteria necessary for reliable, valid, and generalizable determination of vallecula and pyriform sinus pharyngeal residue severity ratings based on FEES.