The ability to measure normality and abnormality, and to accurately assess true changes in swallowing function over time (i.e., change that is not due to measurement error), is important for the management of dysphagia. Furthermore, ensuring the clinical utility of our swallowing measurements (i.e., using measures that are valid, stable, and reliable) is necessary, to evaluate the effectiveness of therapeutic interventions [1, 2].

In the case of dysphagia measures, stability refers to the level of “agreement” between two or more swallows across multiple trials. That is, whether there is any systematic bias such as fatigue or a learning effect that affects the obtained results. Reliability refers to both agreement and consistency across swallows. A measure of reliability is obtained by investigating the presence of random error while acknowledging any systematic error (which is predictable and consistent) [3]. Demonstrating good stability and reliability implies that a measurement tool will produce accurate results.

There is a paucity of information regarding the stability and reliability of the measurement tools that are used for dysphagia research. Without such information, the common practice of taking a mean of measures obtained across multiple swallows to represent a participant’s true swallowing function, lacks accuracy. We contend that such a practice is acceptable only if the variable being measured has been shown to be stable, and if it is captured using a measurement tool that has a demonstrably good level of reliability.

The tongue is important during both the oral and the pharyngeal phases of swallowing, being primarily responsible for bolus propulsion during deglutition [4, 5]. The tongue comprises muscle, connective, and adipose tissues, and is covered in both keratinised and non-keratinised squamous cell epithelium. For descriptive purposes, it is commonly divided into three primary sections; the anterior, medial, and posterior tongue. Significant differences in the levels of muscle, connective, and adipose tissues are found across these three sections of the tongue. These differences exist as a result of the different functions that each of these areas contribute to speech and swallowing [6]. Upon initiation of the swallow, the anterior and medial sections of the tongue maximally elevate in a sequential manner to form a firm lingua-palatal contact area that sweeps posteriorly, forcing the bolus into the pharynx [710]. While the tongue works as a whole to propel the bolus into the pharynx, the nature of movement, and the magnitude, duration, and timing of lingua-palatal contact made by each section of the tongue during swallowing, are distinctly different [7, 9, 11, 12].

Both head and neck (H&N) cancer and its treatment(s) have been shown to significantly affect tongue function during swallowing. Before treatment, it has been purported that patients with H&N cancer propel boluses with less lingual force than do controls [13]. As a result of this, patients often display significantly increased oral transit times and increased levels of post-swallow oral residue, when compared with normal controls [14]. Both before and after treatment, H&N cancer patients have been shown to produce lower maximum oro-lingual pressures on non-swallowing tasks than do controls [15]. The further negative effects of surgery, chemotherapy, and radiotherapy upon deglutitive tongue function have been widely documented [1518].

As the presence of H&N cancer and its treatments have been shown to negatively affect deglutitive tongue function, it is important that we have a reliable method to measure this within this population. It is also necessary to have an accurate measurement of deglutitive tongue function in order to examine changes in function after therapy. Oro-lingual swallowing pressures have previously been used as a measure of deglutitive tongue function [1921]. Differing methods have been used to capture oro-lingual pressures [11, 15, 1936]. The two most widely reported methods are those involving the Iowa Oral Performance Instrument (IOPI) [2631, 34, 37] and the Kay Swallowing Workstation’s (KSW) three-transducer tongue pressure array [1921, 25, 32, 33, 36]. The KSW is a computerised system that enables multiple measures of swallowing to be recorded simultaneously (Fig. 1). Three such measures are: a Videofluoroscopic Swallowing Study (VFSS), surface electromyography (sEMG), and oro-lingual pressures [38].

Fig. 1
figure 1

Kay Swallowing Workstation (KSW)

In an earlier study by Ball et al. [32], the comparative reliabilities of a hand-held and a fixed device for measuring oro-lingual pressures during swallowing in patients with H&N cancer were investigated. The two methods investigated used variants of the same tool; one involved a hand-held silicon plate, the other used the same flexible silicon plate, but adhered to the roof of the mouth. Both versions consisted of three-transducer tongue pressure arrays (with transducers in anterior, medial, and posterior positions). The fixed tongue plate was the more reliable tool with patients with H&N cancer. This was attributed to the variable movements of the hand-held tongue plate during swallowing, resulting in increased opportunity for random error in data collection. Further, in that study [32], the measures obtained across the three transducers of the fixed pressure plate (anterior, medial, posterior) were shown to be differentially reliable. The lowest intraclass correlation coefficient (ICC) value obtained was 0.86 for the posterior transducer, the highest was 0.94 for the anterior transducer, under the condition of swallowing pudding bolus. However, both values signified a good level of reliability [3], indicating that the data recorded using the fixed plate method was less susceptible to measurement error.

The differences in reliability obtained across the three transducers is perhaps not surprising, in the light of the different structures and swallowing facility across the anterior, medial, and posterior sections of the tongue. Therefore, we might expect that the reliability of measurements obtained when swallowing different bolus consistencies will differ. Differences in bolus viscosity have been shown to impact not only upon swallowing physiology [24, 29, 3942], but also upon the reliability of specific measurement methods used to assess swallowing function (Frowen et al., unpublished). Thus, it is important when investigating the reliability of a measurement tool that differing bolus consistencies are considered separately.

Although the findings of the Ball et al. [32] study demonstrated that the measurements obtained using the fixed silicon tongue plate produced reliable measures, the study was limited by a small sample size (10 participants). In addition, the researchers investigated the reliability of oro-lingual swallowing pressures obtained only from patients with H&N cancer. This limits the transferability of this reliability information to studies conducted by others [20, 21, 25, 33, 36] in which oro-lingual swallowing pressures were obtained from healthy adults.

In the current study, we recruited two groups of participants: (1) a convenience sample of patients with H&N cancer and (2) healthy adults, who were matched to the H&N cancer patients on age (within ±3 years) and gender. There were three aims to the current study: first, to determine differences in oro-lingual pressure between these groups when swallowing 3-ml semi-solid boluses; second, to determine the stability of oro-lingual pressure measurements across three swallows within the two groups and separately for each transducer (anterior, medial, and posterior); and third, to determine the test-retest reliability of the oro-lingual swallowing pressure measurements for the two groups and for each transducer (anterior, medial, and posterior).

As H&N cancer has been shown to impact upon deglutitive tongue function [13, 14], it was hypothesized that patients with H&N cancer would demonstrate lower oro-lingual swallowing pressures than controls. As Ball et al. [32] found there to be a significant difference in peak oro-lingual pressures across the three swallows at the anterior transducer for H&N cancer patients, and in light of the reported differences in deglutitive tongue function across the three sections of the tongue (anterior, medial, posterior), it was anticipated that oro-lingual swallowing pressures would differ across the three swallows for H&N cancer patients, and that the pressures obtained from each of the transducers would be differentially stable and reliable for both groups.

Method

Participants

A non-randomised, convenience sample of 19 newly diagnosed H&N cancer patients was recruited from the outpatient H&N cancer clinic at Peter MacCallum Cancer Centre (PMCC), Melbourne, Australia. It is important to establish baseline (pre-treatment) measures to ascertain (1) the effect of cancer itself on swallowing function and (2) the baseline, for future comparison with post-treatment outcomes.

Oro-lingual swallowing pressure data were obtained from a group of age- and gender-matched controls at the Geriatric Research Education and Clinical Centre (GRECC), University of Wisconsin, USA (age was ±3 years; gender match was exact) (Table 1).

Table 1 Participant demographics

Data were obtained from both participant groups using the same methodology. Participants were excluded from the study if they had a prior history of dysphagia, suffered a respiratory disorder that could impact upon swallowing function, were not mobile enough to sit in the X-ray fluoroscope for a swallowing examination, had a prior history of H&N cancer, and/or were unable to provide informed consent for participation in the study.

Procedure

Full approvals from the pertinent ethical committees were obtained prior to the study commencing.

H&N cancer patients’ data were collected at the diagnostic imaging suite at PMCC. The following demographic data were collected from each participant: name, date of birth, age, cancer site, stage of disease, and any previous/current relevant medical history.

The control group’s data were collected at the Radiology Suite, Wm. S. Middleton Veterans Hospital, University of Wisconsin, USA.

Prior to data collection, the KSW was connected to a fluoroscopy unit and to the following attachments: (1) a flexible silicon tongue plate, housing three pressure transducers, to measure oro-lingual swallowing pressures; (2) three sEMG transducers on an adhesive backing, to measure laryngeal activity associated with the moment of swallow occurrence; and (3) a microphone, to audio-record instructions during data collection.

Using the alveolar ridge as an anterior reference point, the silicon strip housing the pressure transducers was adhered, midline, to each participant’s palate, using a strip of double-sided “stomahesive” wafer. This wafer was trimmed to be equal in size and shape to the inferior surface of the tongue plate. Once the silicon strip was fixed to the palate, all three transducers were calibrated simultaneously.

The three laryngeal sEMG transducers were positioned first on the thyroid prominence, and then to the left and right sides of the thyroid cartilage. Participants were positioned lateral to the fluoroscope and the VFSS image was focused to capture the lips anteriorly, the posterior pharyngeal wall posteriorly, the hard palate superiorly, and the cricopharyngeal junction inferiorly [10]. The KSW monitor was customised to display oro-lingual pressures, laryngeal sEMG, and VFSS data simultaneously.

Simultaneous recordings of oro-lingual pressures, VFSS, and laryngeal sEMG were taken as participants swallowed three measured 3-ml teaspoons of a radiopaque (X-OPAQUE-HD, 977 mg/g barium sulfate) semi-solid (custard) bolus. The product substances were of the same viscous-bolus properties for both participant groups. Participants were instructed to take the food off the teaspoon in a single swallow, as previously described [32].

Following data capture, the sEMG and oro-lingual pressure data from each participant were saved to the hard drive of the KSW. These data were then copied to an Excel spreadsheet, where oro-lingual pressure and sEMG data were converted to graph form, and the graphs were used to identify the peak pressures taken from each of the three transducers. These peak pressures were then checked against the raw data stored on the KSW, with the value for each peak being noted. The VFSS recordings were transferred onto DVD for computer playback, with a time code added to the VFSS studies, enabling these data to be cross-referenced with the oro-lingual swallowing pressure data, thereby accurately determining when a swallow occurred.

Because most participants performed multiple lingual movements to produce an effective swallow, but only one data point was required, for consistency across all participants it was necessary for the researcher to select one representative point for analysis. The lingual movement responsible for clearing the largest amount of bolus was chosen. This point was also independently chosen and rated by a second rater (AP). The time at which the swallow occurred was then cross-referenced with the oro-lingual pressure data, so that the pertinent pressures were analysed.

The maximum oro-lingual pressures generated during each swallow (peak oro-lingual pressures) from the anterior, medial, and posterior transducers were recorded for each of the three swallows performed by each participant. In this way, a total of nine data points of peak oro-lingual pressures were retrieved for each participant.

Data Screening and Analysis

Data were screened using a number of techniques, including examination of descriptive statistics (means, medians, standard deviations, skewness, and kurtosis statistics) and visual inspection of histograms and boxplots. Data screening indicated no violations to the assumptions of normality and homogeneity of variance, and no outliers were detected in either the H&N cancer or the control group.

To determine whether there were differences between the oro-lingual pressures of the three swallows (i.e., first, second, or third presentation of the bolus), a series of one-way repeated-measures analyses of variance (ANOVA) were conducted separately for the two groups and for each of the transducers (six models in total). This allowed us to examine stability of measurement [3] and to rule out systematic bias due to either fatigue or learning effects.

Paired t tests were used to ensure there was no significant difference between the patient and control groups in terms of age (to determine adequacy of matching) and to determine group differences in terms of oro-lingual pressure measurements across the three swallows and three transducers. Because no statistically significant variations in oro-lingual pressures were observed across the three swallows for either group, means of the three swallows were used in the calculation of these t tests.

To determine the reliability of oro-lingual pressure measurements, the two-way mixed-effects model of the intraclass correlation coefficient (ICC (3,1)) was used. The ICC reflects both the degree of correlation and the agreement among scores [3, 43], i.e., how consistent and associated are the oro-lingual pressure measurement across the three swallows. ICC values above 0.75 are indicative of good reliability; values of 0.9 and above are likely to be more reliable in ensuring validity and reproducibility of clinical measurements [3]. ICC(3,1) were calculated for two groups and the three transducers separately.

Results

Missing Data

The only missing data point was from one control participant for the posterior transducer on his third swallow. While it was possible that the absent trace may have occurred as a result of instrumental error, this is unlikely because the posterior pressure recordings obtained for the two remaining swallows for that participant were captured without difficulty. It was probable that, during this swallow, the participant simply did not make adequate lingua-palatal contact at the point of the posterior transducer for the recording to occur.

Group Differences

There was no significant difference between the patient and control groups with respect to age or gender, highlighting the adequacy of the individual matching (p = 0.35). The means and standard deviations of peak oro-lingual pressures for the two groups are detailed in Table 2. The means and standard deviations for the anterior, medial, and posterior pressure scores for each of the three swallows of a semi-solid bolus also are provided. Of note, the control group had significantly higher oro-lingual pressure scores across all conditions (all at the 0.001 level).

Table 2 Descriptive, reliability, and stability statistics for both participant groups

Stability of Lingual Pressures

Examination of the stability of measurement across the three swallows for each of the three transducers separately, for both patient and control groups (see Table 2), revealed no significant variation in oro-lingual swallowing pressure measurements across all three swallows (all p values > 0.05).

Reliability of Lingual Pressures

For the H&N cancer group, ICC values ranged from 0.66 (medial transducer) to 0.76 (anterior transducer) (Table 2). For the control group, ICC values ranged from 0.92 (anterior and medial transducers) to 0.95 (posterior transducer) (Table 2). Interestingly, there was no overlap between the 95% confidence intervals (CIs) of the ICCs for the H&N and the controls for the medial and posterior sensor positions.

Discussion

This study is important for two reasons. First, it is the first study in which oro-lingual swallowing pressures in people with H&N cancer are compared with those of a sample of healthy adults. Second, there currently exists no published information regarding the reliability or stability of oro-lingual swallowing pressures across populations—an issue addressed by this study.

Group Differences

Differences in oro-lingual swallowing pressures taken from a H&N cancer population and matched (by gender and age) normal controls were demonstrated. The control group had significantly higher oro-lingual pressure recordings than did the H&N cancer patients. This finding cannot be attributed to age or gender and supports previous reports that patients with H&N cancer have reduced swallowing function (i.e., they propel boluses with less lingual force) than do normals [13]. The ability of oro-lingual pressures to differentiate between control and patient groups also highlights the discriminant validity of this measure.

Lingual Pressure in Healthy Adults

As a group, the controls demonstrated higher within-group variability (but less intra-subject variability) of oro-lingual pressures, as seen by the wide range of recorded pressures. The control group also demonstrated a higher degree of inter-participant variability of lingual pressures across the three swallows (evidenced by higher standard deviations compared with the H&N cancer group). Higher ICCs for the control group can be explained by less intra-control subject variability relative, to between-subject variability. In light of the high degree of inter-subject variability observed in the control group, emphasis needs to be placed on obtaining a larger normative sample to accurately capture this heterogeneity. These findings highlight the need for a taxonomy of normal human swallowing to be developed, against which disordered swallowing can be examined.

Stability of Lingual Pressures

In this study, good intra-subject stability of oro-lingual swallowing pressures was demonstrated over three consecutive swallows of a semi-solid bolus, for both the H&N cancer group and the control group. The level of within-subject stability obtained in the current study can be attributed to careful adherence to research protocol.

This finding differs from that of Ball et al. [32], in which a significant bias was detected between swallows 1 and 3 for the anterior transducer [F(2,18) = 6.49, p = 0.008, η2 = 0.42]. As the bias detected was not a consistent one (i.e., bias existed only between swallows 1 and 2), it was unlikely that it was a result of a learning or fatigue effect. The reduced stability of measure observed by Ball et al. [32] may have been due to the larger bolus size used in that study (5 ml). A larger bolus size increases the opportunity for bolus residue at the transducer site, thereby reducing the stability of the measure. This highlights the importance of considering the effect of bolus size upon measured variables in dysphagia research and the importance of stabilising bolus size, both within and across studies, to enhance the transferability of data and results.

The level of stability demonstrated by both the H&N cancer group and the control group indicates that oro-lingual pressures (recorded with a fixed plate in-situ) may be used as a measure of change over time in swallowing function. Further research needs to be conducted in order to demonstrate the construct validity of using oro-lingual swallowing pressures as a measure of swallowing efficiency.

Reliability of Lingual Pressures

To ensure the clinical utility of a measurement, it is necessary to demonstrate not only that the trait being measured is stable, but also that the measurement tool being used to measure that trait produces reliable measurements. The oro-lingual swallowing pressures recorded from the H&N cancer patients and the control group demonstrated different levels of reliability. For the H&N cancer group, no values reached or exceeded 0.9, which is the value deemed necessary to ensure validity and reproducibility of clinical measurements [3]. The ICC values obtained for the anterior (0.76) and posterior (0.74) sensors either approached or reached the value for a “good” level of reliability (0.75). For the control group, all values exceeded 0.9, the value most likely to ensure validity and reproducibility of oro-lingual pressure measurements [3].

The lower levels of reliability obtained from the H&N cancer group indicate the presence of comparatively higher levels of measurement error when recording oro-lingual swallowing pressures in a sample of H&N cancer patients. Whilst the lingual pressures obtained from both the H&N cancer group and the control group were shown to be statistically stable, it is possible that the comparatively lower reliability scores obtained by the H&N cancer sample indicate greater intra-subject variability in their swallowing behaviour. This is a consideration when recording measures over time with H&N cancer patients.

The ICC scores obtained in this study for the oro-lingual swallowing pressures of H&N cancer patients were comparatively lower than were the ICC scores obtained by Ball et al. [32] from a similar sample. This is a reflection of the larger degree of variation (as evidenced by standard deviations) observed in the Ball et al. study [32] than in the current study, as reliability statistics (such as those obtained by the ICC) increase with increases in variance [3]. The comparatively lower levels of variance observed in the current study may reflect careful adherence to the research protocol and methodologic differences between the two studies, including use of different bolus sizes (current study, 3 ml; Ball et al., 5 ml) and differences in sample size (current study, N = 19, Ball et al., N = 10). The Ball et al. sample size may not have been adequate to provide a true estimate of reliability.

Post-hoc power and sample size calculations were conducted based on the range of the ICCs (min = 0.66, max = 0.98), with power set at 0.90 and α set at 0.05. Those calculations determined that we required a minimum of 12 participants to examine test-retest reliability, with an ICC of 0.66 [44].

In this study, differences in levels of reliability were observed across the three pressure transducers for both participant groups. For the H&N cancer group, the anterior transducer was shown to produce the most reliable measures. This finding is consistent with that of Ball et al. [32]. By contrast, the transducer that demonstrated the highest level of reliability for the control group was the posterior transducer. These differences are likely a reflection of distinctions in deglutitive tongue function between the two groups.

Study Limitations

The limitations of this study are acknowledged as follows: (1) a small sample size; it has been suggested that a study must contain a minimum of 50 participants in order to provide a reasonably precise estimate of the reliability of a measurement tool [45]. Data provided by this preliminary study will, however, enable more precise power calculations to be undertaken; (2) the specificity of subjects in this study; as all participants had to demonstrate a limited gag reflex to tolerate the plate for recording oro-lingual swallowing pressures, it is possible that these data may not be representative of the entire population; (3) the effect of factors such as gender, age, and different bolus sizes on oro-lingual swallowing pressures remain to be examined; and (4) heterogeneity of H&N cancer patients in terms of tumour size/site was not able to be examined (due to small numbers) in this study.

The effect of “fatigue” needs more consideration. We do not know which impacts on functional efficiency more: the effort from a single, strong lingual pressure or that of multiple, smaller lingual movements. Quality data would be needed from a larger study to examine such effects on swallowing.

Clinical Implications

Oro-lingual swallowing pressures, captured using the method described in this study, can be used as a reliable method of measuring change in swallowing over time in both H&N cancer patients and normal subjects. It is advisable that the recordings of oro-lingual swallowing pressures used for analysis be taken from the anterior transducer for H&N cancer patients, because this transducer was shown to capture oro-lingual swallowing pressures with a good level of reliability.

Conclusion

In this study, there existed significant differences between two samples (H&N cancer and controls) with respect to mean peak oro-lingual pressures recorded during swallowing and the reliability of these measures, when captured using the KSW. These sample differences highlight the importance of obtaining information about the reliability of dysphagia assessment tools with the specific population with whom they will be used. There is currently no accepted taxonomy of normal swallowing against which abnormality and/or unsafe swallowing behaviours can be assessed and categorised. Without such information, oro-lingual swallowing pressures, described in this study, cannot be used reliably as a measure of normal versus disordered deglutition. Further study is required in this area.