The presentation of head and neck (H&N) cancer has been associated with various degrees of swallowing dysfunction (dysphagia). Dysphagia may be either the result of the underlying disease process or the sequelae of commonly used treatment protocols [112]. Sites of H&N cancer associated with dysphagia include the oral cavity, pharynx, and larynx [112] as oral, pharyngeal, and laryngeal musculature play an important role in swallowing.

The tongue is an integral organ for swallowing. Anatomically, the anterior two thirds of the tongue fill the oral cavity. The dorsal surface of the tongue faces the hard palate. The posterior one third extends from the sulcus terminalis to the epiglottis, forming the anterior wall of the oropharynx. Structurally, the tongue is composed of striated skeletal muscle, covered with mucous membrane. Intrinsic muscles are located within the tongue and have no bony attachments. The extrinsic muscles of the tongue have bony attachments to the hyoid bone, hard palate, and mandible and are responsible for stabilizing the tongue. Contraction of the extrinsic muscles alters the position of the tongue and changes its shape [13,14].

Complex, graded movements of the oral tongue and base of tongue are essential to bolus containment, loading, propulsion, and clearance during swallowing [1517]. The tongue is the principal anatomic structure responsible for bolus propulsion through the oral cavity into the oropharynx [15,17] and from the oropharynx into the esophagus [18,19]. During the oral stage of swallowing, the tongue presses against the hard and soft palates and moves sequentially in an anterior to posterior direction to propel the bolus to the pharynx [15,17].

H&N cancer and its associated treatments can adversely affect tongue function [112]. Patients with H&N cancer, before treatment onset, have demonstrated significantly lower oral tongue strength (defined as maximum isometric pressure or force generation) when compared with normal (noncancer) subjects [6]. Negative effects that surgery, chemotherapy, radiotherapy, and multimodality treatment protocols have on tongue function have been documented [1,2,47,10,12]. Patients with tongue base resection experience a decrease in volume of tongue base available for pressure generation and, consequently, a reduction in the tongue driving force [12]. Radiotherapy leads to tissue necrosis, which may extend to fibrosis, reducing the range of tongue movement. Researchers evaluating swallowing outcomes after radiotherapy have identified increased space between the base of the tongue and posterior pharyngeal wall compared with that in normal control subjects, at the place when maximum contraction is expected [4,5]. This would likely result in lower than normal propulsive pressures being generated at the base of the tongue. It has been documented that chemoradiotherapy significantly reduces tongue strength in oral and oropharyngeal cancer patients [20].

Since H&N cancers and their treatment can impair tongue movement, the assessment of tongue function is required in this patient population. Information gained from research into tongue function could be significant for treatment planning and for swallowing therapy. Oral tongue pressures have been captured using various methods [6,20,2128]. Two popular methods for tongue pressure measurement are the Iowa Oral Performance Instrument (IOPI) [6,2022,25,26] and the Kay Swallowing Workstation (KSW) three-bulb tongue pressure array [23,27]. The KSW, an integrated computer-based system, is unique for its capacity to simultaneously collect and record information on multiple physiologic aspects of swallowing in real time [27], including oral tongue pressures, together with VFSS. The IOPI is a hand-held portable device containing one small soft air-filled bulb, which is pressed against the roof of the mouth by the tongue to measure strength and fatigueability of oral musculature [21]. While both the KSW and the IOPI have been used to research tongue function with therapeutic intent, to our knowledge no published evidence exists to confirm their reliability.

Currently there is a lack of methodologic rigor in speech pathology dysphagia research. In published studies a majority of researchers have taken multiple measures of one parameter [58,23] and, in many cases, they have then used the mean of these measures for analyses [5,7]. Taking the mean of the measures is only meaningful if the aspect being measured is captured using a reliable tool [29]. Reliability considers how much a measure is free from random variability (chance error), while acknowledging systematic variability (predictable error) in measurement [29]. Systematic and random variabilities may arise from (1) the tester, (2) the measuring instrument, and (3) variability in the subject being measured [29]. Mean scores based on variable data will not be representative of true data values and results will not be generalizable. Therefore, tool reliability must be first assured in order to achieve rigor in dysphagia research.

We have found no published studies that examined the reliability of the tools used for the capture of oral tongue pressures, which include the KSW tongue pressure array and the IOPI. Poor reliability in data collection may translate into inaccurate conclusions being drawn from the data. Therefore, it is essential to establish reliability of the pressure measurement tool before using it clinically and in research. This article addresses the evaluation of the reliability of the KSW tool for capturing oral pressures generated between the tongue and the hard palate during swallowing in the H&N cancer patient population.

This article contrasts two methods for capturing oral tongue pressure data. The first method involves the use of a hand-held silicon tongue array (Method 1) and the second method (Method 2) uses a similar array that adheres to the hard palate (Figs. 1 and 2). The hand-held KSW array raised concerns about patients’ difficulty complying with the primary requirement of maintaining a stable position over a prolonged period of time, causing movement of the array. Therefore, it was considered important to trial and contrast the reliability of the hand-held array with that of the KSW array that was fixed in position. In this article, for Methods 1 and 2 we (1) explore the variability of peak tongue pressure data at the anterior, medial, and posterior pressure sensors during three swallows, (2) examine the reliability of tongue pressure data, and (3) compare the reliability of the two methods. It was hypothesized that using Method 2 would reduce measurement error and produce a more reliable set of tongue pressure data by eliminating movement between and within swallows.

Fig. 1.
figure 1

Hand-held array (left) vs. fixed-position array (right), inferior view.

Fig. 2.
figure 2

Hand-held array (right) vs. fixed-position array (left), superior view.

Method

Participants

As part of an ongoing research program, two consecutive, nonrandomized, incidental samples of participants were recruited before onset of H&N cancer treatment at the Peter MacCallum Cancer Centre (PMCC), Melbourne, Australia. These participants were part of two separate studies. Therefore, group numbers were not equal because of variations in participant availability and recruitment procedures. In 2000, for the first study 21 participants were assessed using Method 1 (hand-held tool). For the second study, in 2002, ten participants were assessed with Method 2 (fixed-position tool) (Table 1). Participants were excluded if they had a prior history of dysphagia; a history of respiratory disorder that could impact upon swallowing function; a history of previous H&N cancer, or were unable to give informed consent. Each participant signed the informed consent form before inclusion in the investigation.

Table 1 Participant demographics for Methods 1 and 2

Procedure

Both PMCC and LTU Human Ethics committees gave approval for these studies. All participants attended the videofluoroscopy suite (diagnostic imaging department) of PMCC, usually one to two weeks before beginning their cancer treatment. Relevant demographic data were collected for all participants, including name, unique hospital patient number, address, phone number, and date of birth. Diagnoses and planned H&N cancer treatment were also recorded.

The KSW was interfaced with a fluoroscopy unit and the following attachments: (1) for tongue pressure measurement, an intraoral silicon plate with three pressure-sensitive, air-filled bulbs embedded in the silicon and (2) for laryngeal activity measurement, one skin surface electromyography (sEMG) probe. The laryngeal sEMG measurements were recorded via one probe, attached to the external skin surface, with its three electrodes positioned on the thyroid prominence and the left and right sides of the thyroid cartilage. The KSW monitor screen was customized for simultaneous display of all variables (Fig. 3).

An intraoral silicon tongue array with three equidistant pressure-sensitive transducers was used to capture oral tongue pressure data in both studies. For Method 1, the array was attached to a flexible metal spline (Figs. 1 and 2). The array was positioned by the researcher on the dorsal tongue surface, with the most anterior and posterior pressure transducers resting on the anterior one-third and two-third margins of the tongue, respectively. The metal spline was then held in situ by the participant throughout all swallows under each condition. To minimize participant fatigue, the hand-held metal spline was removed from the oral cavity between liquid and pudding conditions. In Method 2, each participant had a “splineless” tongue pressure array attached to their hard palate via a piece of stomadhesive wafer cut to size (Figs. 1 and 2). The anterior bulb was positioned on the subject’s alveolar ridge. The middle and posterior bulbs were approximately located at the middle of the hard palate and at the border of the hard and soft palate, respectively.

For both methods of pressure data capture, participants were seated lateral to the X-ray equipment. Images were focused according to the protocol recommended by Logemann (1998) [30]. Lips and posterior pharyngeal wall defined the anterior and posterior aspects of the image, respectively, and the superior surface of the hard palate and the entrance to the esophagus delineated the superior and inferior aspects of the image.

Fig. 3.
figure 3

KSW screen display.

For both methods of data capture, simultaneous recordings of VFSS and sEMG measurements were taken using the KSW. Recordings occurred as participants swallowed three 5-ml boluses of radiopaque (using X-OPAQUE-HD, 977 mg/g barium sulfate) pudding and three 5-ml boluses of liquid (i.e., six swallows were recorded for each subject in total). Participants were instructed to take the whole bolus from a teaspoon in a natural manner for realistic swallowing conditions to occur [31,32] and to minimize variability in swallowing effort. This internationally accepted protocol optimizes validity of the results by increasing the representativeness of swallowing measures [32].

VFSSs, laryngeal sEMG, and oral tongue pressure data generated during the swallow were all displayed on the KSW monitor and viewed in real time to ensure all components were recording correctly. Laryngeal sEMG and oral tongue pressure data were saved to the KSW hard drive and then downloaded to a Zip drive. All VFSSs were recorded on Super-VHS videotapes and later copied. All data underwent post-hoc analyses at the swallowing laboratory at La Trobe University.

Both the laryngeal sEMG and the oral tongue pressure data were converted into linear graph form (Fig. 4) where the pattern of the swallow and the peak pressure scores could be observed. The sEMG tracing, coupled with videofluoroscopy observations, enabled simple and accurate identification of the timing of initiation and conclusion of the pharyngeal swallow. For each swallow, peak tongue pressure values at the anterior, middle, and posterior sensors were extracted from the raw data. The final outcome was 18 peak tongue pressure data points for each participant (liquid bolus = 9, pudding bolus = 9).

Fig. 4.
figure 4

Excel graph depicting tongue pressure recorded at anterior, medial, and posterior sensors during the swallow.

Data Screening and Analysis

Data were screened for normality and homogeneity of variance using several methods, including histograms, box plots, and calculations of skewness and kurtosis. For both sets of data, assumptions of normality and homogeneity of variance were violated, with the data being skewed in both directions and bimodally distributed. To enable the use of parametric statistical analyses, logarithmic (log 10) data transformation was conducted according to guidelines in Tabachnick and Fidell [33]. Post-transformation screening revealed normally distributed data and homogeneity of variance, thus fulfilling the assumptions of parametric statistical analyses.

The main independent variable for this study was position of swallow, which had three levels: first, second, and third. The focus of this study was to compare the three swallows across two conditions: liquid and pudding boluses. This approach was adopted to examine both methods (i.e., hand-held and fixed-position tongue arrays). A series of one-way repeated-measures analyses of variance (ANOVA) was conducted to determine the presence of systematic biases in the swallows [34]. That is, was there a tendency for pressures to increase or decrease in a systematic manner, e.g., with fatigue. Differences in tongue pressures across the three swallows for each bolus condition using both methods were examined with respect to the anterior, medial, and posterior bulb positions separately. The assumption of sphericity was also examined using Mauchly’s test of sphericity and, when violated, a Greenhouse Geisser adjustment was made to degrees of freedom (df). Bonferoni post-hoc analyses were conducted for significant F values. Intraclass correlation coefficient (ICC) was used to investigate test–retest reliability because it reflects both the degree of correlation and the agreement among scores [29]. An ICC Model 3 or a two-way mixed model was selected as the most appropriate test because it enables the testing of intrarater reliability with multiple scores from the same rater [29]. Hence, it was used to explore the correspondence and agreement between mean peak tongue pressure scores, across both methods and conditions. Examining the literature on interpretation of ICC values indicated that there are no hard and fast rules for inferring acceptable reliability. In general, values of 0.75 or above are suggestive of good reliability; however, values of 0.9 and above are likely to be more reliable in ensuring validity and reproducibility of clinical measurements [29]. An ICC value of 0.85 or above was chosen for this study to represent adequacy of reliability.

Results

Missing Data

Screening of tongue pressure data captured using Methods 1 and 2 revealed missing data in both conditions (Table 2). For Method 1, 22.0% (n = 83, N = 378) of all data points were missing, whereas only 0.6% (n = 1, N = 180) were missing for Method 2. Visual inspection of missing data from Method 1 at the anterior, middle, and posterior sensors revealed the following patterns: First, the highest proportion of missing data points was from the anterior sensor (with 42.9% and 47.6% of data captured under liquid and pudding bolus conditions, respectively); second, there were more data points missing for pudding boluses than for liquid bolus conditions at the middle and posterior sensors; and last, there were fewer data points missing for the third swallow, in contrast to the first swallow, for both liquid and pudding bolus conditions, with the exception of data captured at the posterior sensor for liquid and at the anterior sensor for pudding. Only one data coordinate was missing from data captured using Method 2. This occurred at the anterior tongue sensor during the second swallow from one subject when swallowing a liquid bolus.

Table 2 Missing data in both conditions

Descriptive Statistics

The means, standard deviations, and ranges for peak tongue pressure scores for all methods and conditions are presented in Tables 3 and 4. For both methods of tongue pressure data capture and across both bolus conditions, mean scores varied considerably between the three swallows.

For Method 1 (hand-held), there were no significant differences between the mean tongue pressures for swallows one, two, and three, under either liquid or pudding bolus conditions for the three sensor positions. For Method 2 (fixed-position), there was a significant difference between the three swallows for the pudding bolus condition at the anterior sensor [F(2,18) = 6.49, p = 0.008, η2 = 0.42], with Bonferoni post-hoc analyses indicating a significant difference between swallows one and three (p = 0.026). Forty-two percent of the variability (as indicated by η2, a measure of effect size) was attributable to swallow order. For the medial and posterior sensors, no significant differences were detected under the pudding bolus conditions.

Table 3 Descriptive statistics for tongue pressure data (in mmHg)—liquid condition
Table 4 Descriptive statistics for tongue pressure data (in mmHg)—pudding condition

Reliability of Tongue Pressures

In Method 1 (hand-held), ICC values ranged from 0.34 (liquid bolus conditions, posterior sensor) to 0.81 (pudding bolus conditions, medial sensor) (Table 5). No values reached or exceeded the predetermined acceptable value for this study of 0.85 . Furthermore, 50% of ICC values for the hand-held array did not exceed the minimum acceptable level of 0.75.

Table 5 Interclass correlations (ICCs): values for tongue pressure data captured using Methods 1 and 2

For Method 2 (fixed-position), the lowest ICC value was 0.86 (for the posterior bulb) and the highest was 0.94 (for the anterior bulb) under pudding bolus conditions (Table 5). All values exceeded the predetermined acceptable ICC value for this study of 0.85. In fact, 50% of ICC values produced for the fixed position array exceeded 0.9; the value generally accepted as most likely to ensure validity and reproducibility of clinical measurements.

Discussion

This study is one of a few [3540] to examine psychometric measurement reliability within speech pathology dysphagia research. Since evidence-based practice is considered the “hallmark of clinical care” [41], it is essential that reliable tools are used for the capture of tongue pressures. Preliminary testing of the reliability of two methods for tongue pressure capture using the KSW tongue pressure array—Method 1 (hand-held) and Method 2 (fixed-position)—was undertaken.

Rigor of Data Capture

Variance in the amount of missing data points between Methods 1 (hand-held) and 2 (fixed-position) is a likely consequence of methodologic differences. In particular, limitations of tongue pressure data captured using Method 1 were apparent.

There are no specific guidelines to indicate the quantity of missing data that can be tolerated without impacting data generalizability. However, the high proportion of missing data in Method 1 (hand-held) was considered unacceptable because it significantly reduced the quantity of data captured and failed to represent the entire population of tongue pressures at the anterior, medial, and posterior sensors. According to Tabachnick and Fidell (2001), nonrandomly missing data may pose a serious problem in the generalizability of results [33]. Although statistical tests were not conducted in this study to investigate the randomness of missing data, visual inspection suggested a pattern of increased missingness at the anterior sensor. Clinical and VFSS observations indicated transient movement of the tongue array in the anterior and lateral directions, preventing the anterior sensor from maintaining contact with the tongue during swallowing. In this case, it is possible that pressures recorded by the medial and posterior sensors reflected pressures generated more anteriorly and laterally along the tongue surface than the targeted points of sensor contact. Factors identified as being likely contributors to tongue array movement included discomfort/elicitation of the gag reflex, triggered by contact of the tongue array with the hard and soft palate margins in some participants, and participant fatigue, compromising ability to maintain stable array position over time.

Considering the above factors, the marked difference in the proportion of missing data points between Methods 1 (hand-held) and 2 (fixed-position), may not be surprising. Method 2 does not allow movement of the array. Fixation of the tongue array in Method 2 also minimizes the likelihood of a fatigued participant moving the array from one position to another during data capture.

The Impact of Group Differences

We acknowledge that the mean age of the participants using Method 1 (M1 = 63.8) was approximately ten years older than those using Method 2 (M2 = 53.5). The negative impact of aging on swallow physiology has been well documented [42,43]. It is possible that age differences between groups contributed to the difficulty experienced by Method 1 participants in tolerating the hand-held tongue pressure array. Tumor site also differed between groups. Method 1 participants predominantly had oropharyngeal (N = 47.6) and laryngeal (N = 42.8) tumors, while most Method 2 participants had oropharyngeal tumors (N = 70). Theoretically, the presence of a tumor in or around the oral cavity would be expected to compromise a patient’s tolerance for a tongue array compared with patients who have laryngeal tumors. Despite this expectation, in Method 2, where a greater proportion of participants had oropharyngeal tumors, tongue array tolerance appeared superior compared with Method 1. With the current sample size, we cannot definitively identify the impact of age and tumor site on tolerance of the tongue pressure array; however, further clinical research with a larger sample size would be useful in addressing these issues.

Variability of Tongue Pressures

For Method 1 (hand-held), the absence of statistical difference between tongue pressures captured during the three swallows was unexpected. It was anticipated that intrusiveness and movement of the array would result in highly variable tongue pressure data. A possible explanation for the statistical nonsignificance is the bias introduced by the quantity and nature of missing data. Out of 21 potential data points for each swallow, between 1 and 10 data coordinates were missing (see Table 1) for all bolus conditions and at all sensors. A large volume of missing data reduced sample size for statistical analyses, increasing the chance of missing an existing difference between the three swallows. It is also possible that the most variable swallows were those with missing data points (i.e., where most movement of the array occurred), thus further biasing the sample.

Interestingly, there was a significant increase in tongue pressures between the means for swallows one and three for Method 2 (fixed-position) under pudding bolus conditions at the anterior sensor. In fact, 42% of the variability in tongue pressure data at this bulb could be attributed to swallow order (as indicated by η2), strongly suggesting the presence of a systematic bias. A possible reason for this observation is that participants adapted to the intrusiveness of the tongue array, in this case by increasing tongue pressures generated against the hard palate during the swallow. This behavior could have been a response to increasing bolus residue in the oral cavity, as noted by Hind et al. [44]. VFSS observations revealed that participants naturally performed multiple clearing swallows between bolus presentations, which may have explained why increasing bolus residue in the oral cavity was not observed in this study.

Reliability of Tongue Pressures

For Method 1 (hand-held), ICC scores generally suggested poor reliability. Clinical and VFSS observations demonstrated that the array was moving transiently in anterior and lateral directions during data collection. Transient movement created increased opportunity for random error in data collection, and thus poor reliability is not unexpected for this methodology. The unreliable data produced by this hand-held tool compromises the researchers’ ability to use it in studies measuring change over time and to generalize findings beyond the current sample. To our knowledge, reliability of other non-fixed–position tools, such as the widely used IOPI, has not yet been established. Reliability studies using such non-fixed–position tools would enhance meaningful interpretation of the data captured by them.

In Method 2 (fixed-position), ICC scores suggested that there was satisfactory reliability in data captured using the fixed-position array, indicating that data recorded via this method were predominantly free from random error. While systematic error was present, it did not decrease reliability of the fixed-position tongue array. Removing the opportunity for extraneous movement by fixating the tongue array minimizes random error during data collection, thus optimizing reliability. A reliable measurement tool can be used with greater confidence to collect multiple baseline measures, and data obtained from the sample is more likely to be generalizable to the larger population [29]. Although measurement tools are seldom perfectly reliable, the simple modification of fixating a tongue pressure measurement tool’s position can markedly increase its reliability. This highlights the critical importance of analyzing reliability, because small changes to tool design can make a substantial difference to the rigor of dysphagia research.

Because two separate incidental convenience samples were used in the current study, factors such as sample size and age differences between participants in Methods 1 and 2 could not be controlled for, thereby limiting the generalizability of results. Larger prospective studies are required to confirm reliability of data produced by the fixed-position array. Once reliability has been confirmed, there will be a need to establish normative data of tongue pressures generated during swallowing. Pre- and post-treatment examinations of tongue pressures during swallowing in the H&N cancer patient population also need to be performed. Finally, further research is required to test for the presence of different systematic biases, and other population-specific characteristics, that may exist in particular patient groups.

Conclusion

Poor reliability of tongue pressure data captured using Method 1 (hand-held) holds implications for other tongue pressure measurement tools that are not fixed in position. This study suggests that caution in the application of such devices is warranted, since data from hand-held devices may be confounded by measurement error and may not accurately reflect the aspect being measured. Therefore, further research confirming the reliability of these tools is necessary. It is possible that small alterations to instrument design, in this case, fixing the tongue pressure array position, may offer a way of reducing error of measurement and thus increase tool reliability. The use of potentially unreliable tools in dysphagia assessment can compromise our ability to draw conclusions and to generalize findings. It also threatens the effectiveness and soundness of our clinical decision making. As accountable clinicians and researchers, we need to be mindful when using tools where reliability is unknown, while we endeavor to provide high-quality care based on strong evidence.