Videofluoroscopy swallowing studies (VFSSs) remain the most common clinical method for instrumental assessment of swallowing (dys)function. Different ways of interpreting VFSSs have been described and include descriptions of dysfunction [13]; binary ratings, i.e., whether a swallowing movement is present or absent [47]; and/or the use of rating scales to describe the degree of impairment [810]. In addition to such methods, researchers have developed computer software to make detailed temporal, distance, and biomechanical measures of swallowing physiology from VFSSs [1114].

The interpretations of VFSSs, and the clinical decisions that are based on such an assessment, remain subjective because we have no consensus or standardized criteria regarding abnormality and/or what would be considered an “unsafe” swallow. The differing observations and measurements that can be taken from VFSS images are limitations to its generalizability and usefulness in both clinical and research settings. Comparisons across studies may not be achievable when there are wide variations in measurement and interpretation. Furthermore, many measures are time-consuming to undertake so do not lend themselves to a clinical setting or, indeed, to many research environments.

In a recent article [15] we suggested the need to establish a core set of swallowing functions that are pertinent to assess when evaluating dysphagia. A minimum data set that can quickly and easily capture crucial information about swallowing function is desirable. Such measures need to be an accurate representation of the swallow, which means ensuring that the measures are stable, reliable, and valid.

Stability considers the level of “agreement” of two or more swallows during a VFSS, i.e., whether the true score remains unchanged across multiple swallows or whether there are systematic biases, resulting in significantly different values, for any/all of the swallows being measured.

Reliability of a tool considers both agreement (as described above) and consistency, i.e., whether the tool is “dependable” and will measure only the trait of interest. Reliability investigates the degree of random error (due to chance) while acknowledging any systematic error (which is predictable and consistent) [16].

An accurate measure of swallowing must also be valid, i.e., it must measure what it is intended to measure. Construct validity is the degree to which a theoretical construct is measured (e.g., swallowing), while concurrent validity is the degree to which outcomes from one test will correlate with the outcomes of a criterion test, given at the same time [16].

It is important to recognize that reliability and validity of measurement tools may vary, depending on the population of interest. For example, the psychometric properties of such measures may be distinct for individuals with no swallowing difficulties compared to a clinical sample, such as people with head and neck (H&N) cancer. In people with H&N cancer, both the disease itself and the subsequent treatment(s) are associated with (often severe) dysphagia [15, 1720]. Nevertheless, there may be a significant difference in the degree of dysphagia, depending on factors such as the site and/or size of the primary tumor and/or the type or extent of treatment(s). Presenting dysphagia may change in the same individual across the care pathway. Problems experienced before treatment may be more or less severe—or simply different—when compared with those recorded immediately and/or six months after treatment. These variations, both across and within individuals, highlight the need for accurate measurement of swallowing function and the importance of selecting tools that allow stable, reliable, and valid measurements to be made. Thorough investigation of the psychometric properties of tools is the only way to ensure such accuracy. Thus, the first aim of this study was to compare the stability, reliability, and validity of three different types of measures used to analyze the same VFSSs taken from a sample of patients with H&N cancer. These measures were (1) the presence or absence of a swallow disorder, (2) the Bethlehem Assessment Scale (BAS) [21], and (3) detailed biomechanical measures.

Prior researchers have identified differences in swallowing physiology across different bolus consistencies, resulting in different measurement outcomes such as increased tongue base to pharyngeal wall contact, increased oral and pharyngeal transit times, and increased cricopharyngeal opening for semisolids versus liquids [2224]. A second aim was therefore to determine whether the psychometric properties of the tools differed when used to analyze swallowing of two bolus consistencies: liquids and semisolids. Using this approach, we aimed to identify key elements for inclusion when assessing dysphagia in a specific group of patients with H&N cancer.

Methods

Participants

Videofluoroscopy swallowing studies (VFSSs) were undertaken with 40 patients who had H&N cancer, three months after completing either radiotherapy (n = 10) or chemoradiotherapy (n = 30) for definitive treatment. These patients were sequentially recruited from the Peter MacCallum Cancer Centre (PMCC) and The Alfred Hospital, Melbourne, Australia. Thirty-two men and eight women were recruited (age = 42–80 years; mean = 59.5 years). Eighteen of the participants (45%) had been treated for primary tumors of the tonsil. Twelve (30%) had primary tumors within the base of tongue and ten participants (25%) had a tumor of the larynx. Just over half of the participants (52.5%) had large (T3–T4) [25] tumors, while 47.5% had smaller (T1–T2) tumors (Table 1). When examined against The Cancer Council Victoria’s statistics for H&N cancers in 2004 [26], this sample was considered representative of H&N cancer patients throughout the State, for both gender and age.

Table 1 Participant demographics

The variability in the site and size of tumors and the treatment received is due to the sequential nature of the participants’ recruitment. Participants ranged in feeding status, from being completely dependent on enteral nutrition to tolerating a normal diet with no restrictions (Table 1).

The study was approved by the Human Ethics Committees of PMCC, The Alfred Hospital, and La Trobe University, with written informed consent being obtained from all participants.

Procedure

VFSSs were conducted using a standard protocol. Participants were asked to complete three swallows each of measured 3 ml of barium liquid and 3 ml of semisolid (fruit puree) mixed with barium sulfate. All boluses were administered via a 5 ml teaspoon. A bolus volume of 3 ml was chosen because it is small enough to reduce the risk of aspiration but large enough to elicit a swallow reflex [28]. Other researchers have demonstrated that a 3 ml volume is often one that patients can swallow throughout all stages of radiotherapy treatment [2931].

Participants were seated upright, viewed in the lateral plane, and the fluoroscope was focused on the lips anteriorly, the soft palate superiorly, the cervical vertebrae posteriorly, and the lower end of the cervical esophagus inferiorly. A 5-cent Australian coin was taped under the participant’s chin to correct for magnification, allowing for calibration during post-hoc analyses. Images were recorded on high-quality videotape, using either the analog Kay Swallowing Workstation (Kay Elemetrics, Lincoln Park, NJ) or a Sony (Model SVO-9500MDP) VHS videocassette recorder (Sony, Australia), depending at which hospital the procedure was conducted. Images were then digitized into .avi format using the video capture software “Virtual Dub” (GNU General Public License) to allow all post-hoc measurements to be made on a personal computer.

Measurements

Three types of measures were completed on all six swallows (three liquid, three semisolid) from each of the 40 participants: (1) the presence or absence of swallow disorders, (2) the BAS, and (3) detailed temporal, distance, and biomechanical measures (collectively referred to as “biomechanical measures”).

Presence or Absence of Swallowing Disorders

Numerous swallowing “disorders” are documented in the H&N dysphagia literature and 12 of these were selected for this study on the basis of their reported prevalence [4, 5, 7, 14, 30, 3234]. These were judged as being “present” or “absent” during each swallow and were operationalized from the literature [24, 28, 32] as follows:

  • Poor bolus formation: material spreads around oral cavity and/or part of the bolus prematurely spills into the pharynx.

  • Prolonged oral transit: more than 1 second between initiation of the oral swallow (first posterior movement of the bolus from the hold position) and the bolus passing through the faucial arches and/or repeated tongue-pumping motion.

  • Reduced velopharyngeal closure: velopharyngeal closure is incomplete and/or material enters the nasal cavity and nasal regurgitation seen.

  • Delayed onset of swallow reflex: the head of the bolus is beyond the point where the lower edge of the mandible crosses the tongue base before the swallow is initiated (the first frame showing laryngeal elevation).

  • Base of tongue (BOT) and/or posterior pharyngeal wall (PPW) weakness: reduced posterior movement of the BOT with reduced/incomplete contact to the PPW.

  • Reduced laryngeal elevation: limited superior and/or anterior movement of the larynx during the swallow.

  • Reduced epiglottic inversion: absent or incomplete (remains horizontal/does not completely close off the laryngeal vestibule) tilting of the epiglottis during the swallow.

  • Reduced laryngeal vestibule closure: incomplete contact of the arytenoid to epiglottic base.

  • Pharyngeal residue: any portion of the bolus (more than trace) remains in the valleculae and/or pyriforms and/or BOT and/or PPW postswallow.

  • Cricopharyngeal muscle dysfunction: delayed opening when the bolus reaches the cricopharyngeus and/or residue in the pyriform sinuses after the swallow.

  • Laryngeal penetration: part of the bolus enters the larynx and remains at or above the level of the vocal folds.

  • Aspiration (including silent aspiration): part of the bolus enters the larynx and passes below the vocal folds into the subglottis.

The Bethlehem Assessment Scale (BAS)

The BAS has 11 domains, each of which are rated on a four-point ordinal rating scale, where 1 represents “normal” and 4 represents a “severe dysfunction” [21]. The 11 domains include lip function, tongue function, jaw function, soft palate elevation, swallow reflex, hyoid elevation, residue in valleculae, residue in pyriform sinuses, aspiration, pharyngeal wall function, and cricopharyngeal function. The BAS has been validated using an Australian population of people with motor neuron disease and has good interrater reliability [35]. Because measures from the BAS are separately rated over each of the 11 different swallowing domains (rather than one rating made of the swallow function as a whole), this was a suitable scale for use in the current study.

Biomechanical Measures

There have been many biomechanical measures used in analysis of VFSSs [11, 12, 28]. In this study we chose those measures most frequently identified in the H&N cancer literature as being a problem following radiotherapy and where the methods for analyzing them had been clearly described [12, 14, 36, 37]. Temporal measures were completed using frame-by-frame analysis of each swallow to identify the first and last frames that showed each movement of interest. Distance and biomechanical measures were completed using the public domain software program “IMAGE” developed at the National Institutes of Health (Washington, DC). This is a public domain program available at http://rsb.info.nih.gov/nih-image. The methods for completing measures from VFSS images using the IMAGE program have been previously described [12, 13]. Table 2 lists the biomechanical measures used in this study, references for how to measure them, and an indication of the direction of better scores.

Table 2 Biomechanical measures

Statistical Analysis

The statistical analyses of data address three aspects of measurement: stability (whether the values obtained for the three swallows were significantly different), reliability (both agreement and consistency of observed swallows), and validity (specifically construct validity), across liquid and semisolid boluses.

Stability

Stability was analyzed for the categorical data (presence/absence of swallow disorders) using the Cochran’s Q test statistic. Stability across the three swallows was analyzed with all other data (BAS and biomechanical measures), using a series of one-way repeated-measures analysis of variance (ANOVA). The assumption of sphericity was examined using Mauchly’s test of sphericity and, when violated (p < 0.05), a Greenhouse Geisser test was used to adjust the degrees of freedom. Post-hoc analyses, using the least significant difference (LSD) test, were conducted on variables (from BAS and biomechanical measures) with significant F values from the ANOVA models to enable identification of which of the three swallows significantly differed.

Reliability

For the categorical data (presence/absence of swallow disorders), the overall percentage of agreement was calculated for each variable by assigning “agreement” (i.e., the disorder is present/absent in all three swallows) or “nonagreement” (the disorder is present in some swallows but absent in others) to each of the 40 participants’ VFSSs. Reliability was considered to be acceptable across the three swallows for a variable if the percentage of agreement was 75% or above, indicating “good” agreement [16]. To assess both the level of agreement and the degree of correlation (reliability) for the three swallows for the BAS and biomechanical measures, intraclass correlation coefficient (ICC 3,1) was used. Portney and Watkins [16] state that values of 0.75 or above indicate “good” reliability, so this value was chosen to represent “adequate” reliability in this study.

Validity

Factor analysis was used as both a data reduction technique and as a measure of construct validity of the BAS and biomechanical measures. Principal axis factoring with oblique rotation was conducted on swallow 2 for both liquids and semisolids to formulate a pattern matrix and thereby identify the factor loadings for all variables across the two measurement scales. Extraction of factors was based on the eigenvalue greater than 1 rule [38]. Factor loadings greater than 0.4 were considered important to the interpretation of the factor structure, and, therefore, variables that had a poor loading on the factor (i.e., < 0.4) were removed from the analysis.

Variables that clustered together (i.e., were “related” in some way) to form a factor were then analyzed using Cronbach’s alpha to establish the degree of correlation between these variables (internal consistency of each of the factors). Cronbach’s alpha > 0.75 was considered indicative of a strong relationship between variables loading on that factor (i.e., good internal consistency) [16].

The variables within each factor were also analyzed using Pearson’s product moment correlation coefficient (r) to establish which variables were highly correlated with each other (suggesting that they essentially measure the same aspect of the underlying trait) and which were poorly correlated with each other (suggesting that they are “related” but measure different aspects of the underlying trait). Each factor was then assigned a clinical “label” according to the common theme or theoretical construct that characterized that group of variables.

To assess the concurrent validity of the presence/absence variables, point-biserial correlations (r pb) with the factor scores derived from the factor analyses for liquid and semisolid factors were calculated. Only the presence/absence variables that were considered to be clinically related to each factor were analyzed. The purpose of this approach was to capture the relationship between the categorical variables and the other (BAS and biomechanical) measures, and the strength of this relationship was considered “good” to “excellent” for correlation coefficient values above 0.75 [16].

Intra- and interrater reliability for all measures was examined, respectively, by 10% of all measures being repeated by the same examiner six weeks later and independently by a second examiner. The percentage agreement between each measure was calculated for the presence/absence measures and the BAS (BAS measures were considered to be in agreement if the scores were within one rating of each other). Intra- and interrater reliability was at least 96% and 92% (for the presence/absence measures) and 100% and 92% in agreement (for the BAS), respectively. For the biomechanical measures, using ICC, intrarater reliability was at least 83%, and the interrater reliability was at least 80%, except for the variable maximum PPW movement, which had a reliability of 72%.

Results

Stability

The descriptive statistics for all variables and swallows are shown in Tables 3 and 4 for liquid and semisolid consistencies, respectively. Two domains of the BAS—lip function and jaw function—were excluded from all subsequent analyses, because all swallows (for all participants and both consistencies) received a value of 1 (i.e., normal) for these factors, indicating zero variance. There were three variables for the liquid consistency and six variables for the semisolid consistency, where significant differences in mean values existed across the three swallows.

Table 3 Descriptive statistics and reliability of the three swallows—liquids
Table 4 Descriptive statistics and reliability of the three swallows—semisolids

Liquids

For the liquid consistency, the following variables were significantly different across the three swallows: (1) pharyngeal wall function [BAS; F(2, 76) = 3.40, p = 0.039]; (2) duration of cricopharyngeal opening [biomechanical measure; F(2, 76) = 4.08, p = 0.021]; and (3) number of swallows required to clear the bolus [biomechanical measure; F(1.31, 49.59) = 6.02, p = 0.011]. Post-hoc analysis revealed that for all three variables, swallow 1 was significantly different than swallows 2 (all p < 0.05) and 3 (all p < 0.05). Higher (i.e., worse) scores were rated on swallow 1 for pharyngeal wall function, indicating that this variable improved as the swallow study progressed, whereas lower (i.e., better) scores were rated on swallow 1 for the duration of cricopharyngeal opening and number of swallows required to clear the bolus, indicating that these variables became worse as the swallow study progressed.

Semisolids

The following variables were significantly different across the three swallows for the semisolid consistency: (1) poor bolus formation [presence/absence measure; Cochran’s Q (2) = 6.0, p = 0.05]; (2) prolonged oral transit [presence/absence measure; Cochran’s Q (2) = 10.29, p = 0.006]; (3) pharyngeal residue [presence/absence measure; Cochran’s Q (2) = 9.33, p = 009]; (4) residue in valleculae [BAS; F(1.67, 56.84) = 4.14, p = 0.027]; (5) pharyngeal wall function [BAS; F(2, 68) = 5.93, p = 0.004]; (6) duration of cricopharyngeal opening [biomechanical measure; F(2, 68) = 12.38, p < 0.001]; (7) pharyngeal area at maximum constriction [biomechanical measure; F(1.641, 55.79) = 3.73, p = 0.038]; (8) maximum width of cricopharyngeal opening [biomechanical measure; F(2, 68) = 11.49, p < 0.001]; and (9) percent of pharyngeal residue [biomechanical measure; F(1.523, 51.78) = 10.83, p < 0.001].

For the variables presence of poor bolus formation and prolonged oral transit, there were more incidences of oral stage disorders for swallow 1 compared with swallows 2 and 3, indicating that these variables improved as the study progressed. For the variable presence of pharyngeal residue, there was less pharyngeal residue present for swallow 3 than for swallows 1 and 2, indicating less residue as the study progressed. Similarly, post-hoc analysis revealed that for residue in valleculae (BAS), swallows 1 and 3 were significantly different (p = 0.023), again with scores indicating less residue as the study progressed. Swallow 3 had significantly less pharyngeal area at maximum constriction than did swallows 1 (p = 0.033) and 2 (p = 0.031), and for the percent of pharyngeal residue, swallow 1 had significantly higher residue than did swallows 2 (p = 0.003) and 3 (p = 0.001), indicating improved performance during the swallow study for these variables. For pharyngeal wall function (BAS) and maximum width of cricopharyngeal opening, there were significantly higher mean scores for swallow 1 compared with swallows 2 (p = 0.012, p < 0.01) and 3 (p = 0.003, p < 0.01). For duration of cricopharyngeal opening, all three swallows differed at the p < 0.05 level.

For both liquid and semisolid consistencies, in all instances where a statistically significant difference existed across the swallows, either swallow 1 or swallow 3 was the source of this difference, with swallow 2 being in between. Because of this difference, it was decided that for all variables and for both consistencies, data from swallow 2 only would be used for the subsequent analyses of validity.

Reliability

Liquids

For the liquid consistency, all categorical data (presence/absence of swallow disorders) reached a percentage agreement of 75% or higher, indicating a good level of agreement across the three swallows. All BAS measures demonstrated good reliability (i.e., ICCs > 0.75), with values ranging from 0.78 (BAS swallow reflex) to 0.96 (BAS cricopharyngeal function) (Table 3). There were five biomechanical measures that did not reach the predetermined reliability criterion set in this study. These included oral transit time (0.53), pharyngeal transit time (0.73), pharyngeal delay time (0.62), duration of laryngeal elevation (0.63), and extent of hyoid excursion (0.73).

Semisolids

There were a number of swallowing measures obtained from the semisolid swallows that did not reach a percentage agreement of 75% nor an ICC value of 0.75 (Table 4). For the presence/absence measures, the presence of delayed swallow reflex reached only 54.1% agreement across the three swallows.

On the BAS scale, measures that did not reach an ICC value of 0.75 were tongue function (0.69), soft palate elevation (0.74), swallow reflex (0.66); and residue in pyriform sinuses (0.69). For the biomechanical measures, variables that did not reach 0.75 were oral transit time (0.70), pharyngeal transit time (0.54), pharyngeal delay time (0.56), duration of BOT-PPW contact (0.45), duration laryngeal vestibule closure (0.45), duration of cricopharyngeal opening (0.65), extent of hyoid excursion (0.62), extent of vestibule closure (0.48), and penetration-aspiration (0.73).

Validity

Factor analysis revealed the presence of six factors each for liquids and semisolids, with factor loadings above 0.4 and eigenvalues above 1.0. The factors comprised 22 variables for liquids and 18 variables for semisolids. The factor “structures” underlying each condition (liquids and semisolids) was distinct and is discussed separately. The variables, factor loadings and Cronbach’s alpha values for each of the six factors are shown in Tables 5 and 6 for liquids and semisolids, respectively. The maximum and minimum correlations for the variables within each factor are shown in Tables 7 and 8 for liquids and semisolids, respectively.

Table 5 Factor analysis: pattern matrix showing factor loadings for each variable within the factor, clinical labels, and correlations—liquid consistency
Table 6 Factor analysis: pattern matrix showing factor loadings for each variable within the factor, clinical labels, and correlations—semisolid consistency
Table 7 Intercorrelations of variables for each factor—liquid consistency
Table 8 Intercorrelations of variables for each factor—semisolid consistency

Liquids

Six identified factors explained 66.72% of the total variance, with Cronbach’s alpha values ranging from 0.71 to 0.92. The most important factor, explaining 31.76% of the variability, was labeled Pharyngeal Motility, encompassing variables that represent residue of the bolus within the pharynx and movement of pharyngeal structures to propel the bolus through the pharynx. Specifically, this factor included seven variables: residue in valleculae (BAS), number of swallows to clear bolus, pharyngeal wall function (BAS), maximum BOT movement; percent pharyngeal residue, area at maximum pharyngeal constriction, and residue in pyriform sinuses (BAS). The other five factors were assigned the following clinical labels: Duration of Pharyngeal/Laryngeal Functions, Extent of Pharyngeal/Laryngeal Functions, Swallow Initiation and Timing, Airway Protection, and Duration of Pharyngeal Transit. All six factors were then correlated with specific variables from the scale “presence/absence of swallow disorders” that were considered to be clinically related (Table 9). Point-biserial correlations ranged from 0.044 to –0.718, with the strongest relationship observed between Factor 5 (Airway Protection) and the presence of cricopharyngeal dysfunction (r pb = −0.718). Correlations were low overall, with almost all values being well below the predetermined level of 0.75.

Table 9 Correlations of BAS and biomechanical measures factors with presence/absence of swallow disorder variables

Semisolids

The six factors identified for semisolids explained 70.47% of the total variance, with Cronbach’s alpha values ranging from 0.47 to 0.93. Again, the most important factor related to the clinical concept of Pharyngeal Motility (explaining 27.45% of the variability) and included the same variables as liquids, excluding residue in pyriform sinuses (BAS). Clinical labels, as used for liquids, were assigned to three of the factors for semisolids: Duration of Pharyngeal/Laryngeal Functions, Extent of Pharyngeal/Laryngeal Functions, and Swallow Initiation and Timing. Factor 6 was assigned the clinical label Temporal Measures, though the clinical association of the variables oral transit time and duration of laryngeal elevation is unclear (Table 6). Cronbach’s alpha for this factor was only 0.466. All six factors were correlated with clinically related variables from the scale “presence/absence of swallow disorders” (Table 9). Again, point-biserial correlations were low overall, ranging from −0.067 to 0.864.

Discussion

This is the first study in which researchers have used comprehensive psychometric techniques (i.e., stability, reliability, and validity) to investigate different swallowing measures taken from VFSSs. Using this approach, we highlight issues of measurement of swallowing in a sample of H&N cancer patients with oropharyngeal and laryngeal tumors treated with (chemo)radiotherapy.

Stability

Stability varied between bolus consistencies, with more measures having poorer stability for semisolids than for liquids. There were particular differences identified between swallows 1 and 3, highlighting the variable nature of swallowing from trial to trial.

Results from the semisolid consistency indicated that the swallow function improved as a VFSS progressed. This applied to the variables presence of poor bolus formation, presence of prolonged oral transit, presence of pharyngeal residue, residue in valleculae (BAS), pharyngeal area at maximum constriction, percent of pharyngeal residue, and pharyngeal wall function (BAS). This suggests a practice effect (with improvement in pharyngeal constriction) and/or increased lubrication within the pharynx resulting in less pharyngeal residue of the semisolid bolus as the study progressed.

For the liquid consistency, analysis of the pharyngeal wall function (BAS) again indicated an improvement in function. However, there was an increase in the number of swallows required to clear the liquid bolus over the trial (indicating worsening function as the VFSS progressed). This may be an artifact of the coating effect of liquid barium. Thus, for swallows 2 and 3, more swallows were required to clear the already-coated anatomic structures than was the case for swallow 1. These results challenge our commonly accepted practices where the mean of multiple swallows is often taken to represent overall swallow function [5, 14, 23, 31, 39]. The legitimacy of this can be assured only in instances where the three recorded swallows are very similar, which in this study was not always the case, particularly for the variables pharyngeal wall function (BAS) and duration of cricopharyngeal opening, where significant differences were found across the three swallows for both consistencies. Classic test theory states that an observed score (X) is a function of the true score (T) and an error component (E) [16]. As such, taking the mean of three swallows provides the mean (and standard error) of the observed scores but does not take into account the error component of each individual swallow. Using results from swallow 2 provides a measure that more accurately represents the true score ± the error, such as potential practice (as seen in this study) or fatigue effects.

Reliability

Low ICCs were obtained for both liquid and semisolid consistencies for the following four variables: oral transit time, pharyngeal transit time, pharyngeal delay time, and extent of hyoid excursion. These findings suggest poor reliability of the measure itself, i.e., measurement error. According to Portney and Watkins [16], measurement error (i.e., the difference between the observed score and the true score) may be due to (1) rater (examiner) error, (2) the measuring instrument, or (3) variability in the trait or subject being measured. For example, for the variable oral transit time, brief oral “holding” of a bolus prior to the swallow or changes in oral lubrication may contribute to measurement error.

There were a number of variables for which a low-percentage agreement or ICC was calculated for one bolus consistency, but an adequate percentage agreement or ICC was calculated for the other consistency. For liquids, this finding applied only to duration of laryngeal elevation. For semisolids, this applied to presence of delayed swallow reflex, tongue function (BAS), soft palate elevation (BAS), swallow reflex (BAS), residue in pyriform sinuses (BAS), duration of BOT-PPW contact, duration of laryngeal vestibule closure, duration of cricopharyngeal opening, extent of vestibule closure, and penetration-aspiration scale. These results suggest that the measure may not be sensitive or accurate enough to capture that particular variable (or trait) during the swallow of that particular consistency. When examining duration of BOT-PPW contact, for example, some patients may have anticipated difficulties with a semisolid bolus and therefore performed a more effortful swallow, thus altering the dynamics of pharyngeal motility during all or some of their semisolid bolus swallows. These same patients may not have anticipated or experienced difficulties with liquids, which may have contributed to the more reliable results in measuring duration of BOT-PPW contact for liquids.

There were a number of variables for both consistencies where stability was poor, yet reliability for that same variable was high (Tables 3 and 4). For example, the variable percent of pharyngeal residue for semisolids had poor stability, so the true score significantly changed (i.e., there was less residue) as the VFSS progressed. However, reliability for this variable was high (0.85), indicating that despite the poor agreement, there is good consistency and the variable can be considered an accurate representation of the trait of interest (i.e., the percentage of residue within the pharynx).

Validity

The validity section of this study was divided into two components: (1) construct validity of the BAS and biomechanical measures and (2) concurrent validity of the presence/absence measures. To our knowledge, using factor analytic techniques to understand the psychometric properties of swallowing measures is a novel approach. Several key findings can be drawn from the two separate factor analyses for liquid and for semisolid boluses.

First, the amount of variation explained in both models was large, 66.72% and 70.47% for liquids and semisolids, respectively. With these results we can be confident that the six factors identified for each consistency accurately capture the construct of swallowing. It should be noted, however, that the clinical significance of Factor 6 for semisolids (Temporal Measures) was unclear.

Second, Pharyngeal Motility was the factor that explained the majority of variation in both liquids and semisolids. This is clearly an essential aspect of swallowing that needs to be measured in this population, and the results of the factor analyses identified seven variables for liquids and six variables for semisolids that could be considered suitable for capturing the movement of pharyngeal structures during the swallow and/or pharyngeal residue after the swallow.

Third, there were three factors, in addition to Pharyngeal Motility, that clearly captured swallowing for both liquid and semisolid consistencies: Duration of Pharyngeal/Laryngeal Functions, Extent of Pharyngeal/Laryngeal Functions, and Swallow Initiation and Timing. However, two factors were revealed for each consistency (Factors 5 and 6 for liquids and Factors 3 and 6 for semisolids) that were exclusive to that consistency only, indicating that slightly different characteristics are observed when each consistency is swallowed. These differences across consistencies are essential to consider when measuring swallowing from VFSSs, i.e., a tool that captures specific aspects of a liquid swallow cannot be assumed to be an equally valid tool for capturing a semisolid swallow.

The presence/absence measures correlated poorly with the factors for both semisolid and liquid consistencies, indicating poor concurrent validity of these dichotomous measures. This suggests that either much variability was unaccounted for (i.e., more than 49% in all cases) or that these variables are measuring a different construct, which seems unlikely. For example, Factor 2 for semisolids (Swallow Initiation and Timing), which included the variables pharyngeal delay time, pharyngeal transit time, and swallow reflex (BAS), did not correlate highly with the presence/absence measure delayed swallow reflex (r pb = 0.527), although clinically we would anticipate that these measures should represent the same construct. Because these dichotomous variables (e.g., a disorder being present or absent) had little association with either the BAS or the biomechanical measures, this raises questions about their validity when being used to measure swallowing in H&N cancer patients. We acknowledge that the dichotomous variables are different in that they measure absolute abnormality, whereas the BAS or biomechanical measures measure the degree or severity of problems. These differences raise further questions about whether such dichotomous variables are sensitive to change over time, i.e., whether they can capture small but clinically meaningful changes from one VFSS to another, when they indicate only absolute abnormality.

Differences in Bolus Consistencies

This study identified a number of significant differences between the two bolus consistencies. While previous researchers have found that different bolus consistencies yield different results in terms of the variable being measured (e.g., pharyngeal transit time, duration of cricopharyngeal opening), the results of this study go further, identifying potential problems with the reliability and validity of measurement variables when applied to different consistencies. Swallowing different bolus consistencies needs to be conceptualized separately; not only do specific variables differ for different consistencies, but the swallow as a whole differs. Therefore, using different measurements for each consistency may be a more accurate way to measure swallowing and monitor changes over time.

Recommendations

Six underlying factors of importance for swallowing in H&N cancer patients were identified, but these factors were slightly different across the two bolus consistencies, indicating the need to measure slightly different aspects of swallowing for each consistency to be accurately represented.

Intercorrelations among the variables within each factor can be used to identify which variables are the most apposite to measure. Variables that have a high loading on a factor (and are therefore a good representation of that factor) but low intercorrelations with each other (and therefore measure slightly different traits) are best. For example, Factor 1 for semisolids (Pharyngeal Motility) includes the variable number of swallows required to clear the bolus, which has a high loading on the factor (0.919) but a low intercorrelation with other variables (0.469 with the variable maximum BOT movement). We suggest that within each factor one or two measures are taken, selected on the basis of having a high loading on a factor, but a low intercorrelation with other variables, while taking into account the reliability of the measure itself.

Results of the factor analyses indicate that by selecting a minimum set of measures in this manner, adequate construct validity would be achieved, and the remaining variables within each factor are then not essential. For example, when considering the Duration of Pharyngeal/Laryngeal Functions (Factor 2) of liquids, if the variable duration of BOT-PPW contact was measured, then the variables duration of vestibule closure and duration of laryngeal elevation do not need to be measured. Because duration of laryngeal elevation was found to have poor reliability for liquids (Table 3), selecting an alternative measure to represent the Duration of Pharyngeal/Laryngeal Functions is appropriate.

Further research is needed to investigate the stability, reliability, and validity of swallowing measures that are being used with different populations. The issues identified in this article remain to be investigated with other measurement tools that are currently used in research and clinical settings, such as FEES, pharyngeal manometry, and oral tongue pressure measurements.

Conclusions

This study relates to specific data taken from a H&N cancer population that was treated with (chemo)radiotherapy for oropharyngeal or laryngeal tumors. However, the results may be applied to other populations, including normals, and to the examination of other swallowing tools.

The outcomes of this study raise questions about the current methods that are used to measure swallowing from VFSSs. We have found that different measures may be important for assessing swallowing of different consistencies, and that data taken from the mean of three swallows may not be an accurate representation of swallowing because the stability of swallow sequences needs to be considered.

We have identified key elements that need to be included to comprehensively and accurately assess swallowing function in this H&N cancer population. We have not specified what the best measures are, but we suggest specific elements to capture the key characteristics of swallowing different bolus consistencies.

Ongoing investigation into the psychometric properties of our clinical tools will further contribute to our understanding of accurate measurement, thereby improving dysphagia research and practice.