Introduction

Accurately and non-invasively assessing swallow function is vital within the clinical setting in order to correctly identify patients with dysphagia who are at risk of aspiration and complications that arise secondary to aspiration such as aspiration pneumonia, malnutrition, and dehydration. Current clinical dysphagia screening methods have a high degree of sensitivity and a poor degree of specificity, which result in over-identification of people with dysphagia [1,2,3]. This is because dysphagia screening protocols rely on subjective human judgment of risk factors and observing patients drink a limited amount of liquid and by their nature do not measure any aspects of swallow physiology. There is also a risk of false negatives with current dysphagia screening methods due to the asymptomatic nature of silent aspiration. Poor specificity of dysphagia screening methods results in misuse of time and resources with unnecessary, expensive procedures for patients such as undergoing videofluoroscopy, which remains one of the gold standards for assessing swallowing physiology. While videofluoroscopic swallow studies (VFSSs) are useful for characterizing swallow function, for many patients they are not always feasible or available in a time frame that enables rapid diagnostic assessment, leaving clinicians to temporarily manage cases as best they can with available clinical information. Therefore, there is a high demand to increase accessibility to dysphagia assessment for underserved patients for the development of non-invasive methods for accurately screening and assessing swallowing that might also provide insight into underlying swallowing physiology.

High-resolution cervical auscultation (HRCA) is an emerging method for non-invasively screening several aspects of swallow function that has demonstrated promising preliminary evidence of its effectiveness [4]. HRCA combines the use of acoustic signals from a contact microphone, vibratory signals from a tri-axial accelerometer, and signal processing and machine learning techniques to effectively characterize swallow function. Non-invasive neck sensors are placed on the anterior laryngeal framework at the cricoid cartilage to record signals that occur during swallowing. To this date, our database consists of concurrent VFSS and HRCA recordings from 274 patients with suspected dysphagia and 70 community dwelling healthy adults. We are analyzing the data in our database in a systematic way (e.g., one temporal swallow kinematic event at a time, one patient population at a time) to evaluate the potential of HRCA as an effective dysphagia screening method. HRCA signals combined with signal processing and machine learning techniques have demonstrated the ability to automatically detect swallowing events with similar accuracy to trained human judges and to effectively differentiate between safe and unsafe swallows by approximating VFSS judgments made using the penetration-aspiration scale [4,5,6,7,8,9,10]. We are examining the association between HRCA signals and scores of physiological components on the Modified Barium Swallow Impairment Profile (MBSImP) [11] and are finding promising levels of agreement in patients with suspected dysphagia. Results have revealed statistically significant associations between HRCA signals and anterior hyoid bone movement (component #9), pharyngoesophageal segment opening (component #14), and pharyngeal residue (component #16) [12,13,14]. In addition to this, we have found a strong association between HRCA signal features and hyoid bone displacement [15,16,17]. A recent study examining hyoid bone displacement found that ≥ 50% of the body of the hyoid bone could be accurately tracked on each frame using HRCA signals and machine learning techniques alone in healthy community dwelling adults and patients with suspected dysphagia [18]. HRCA signals combined with machine learning techniques have demonstrated effectiveness in detecting other kinematic swallowing events including laryngeal vestibular closure and upper esophageal sphincter (UES) opening with a high degree of accuracy in healthy community dwelling adults and patients with suspected dysphagia [19,20,21].

While HRCA has been used to detect penetration and aspiration, clinical ratings of physiological events of swallowing using the MBSImP, and various kinematic events of swallowing, it has not previously been used to characterize swallow function in specific patient populations. Patients with neurodegenerative diseases often experience progressive dysphagia along with other physical mobility impairments, which greatly impact their quality of life [22,23,24,25,26,27]. Dysphagia in patients with neurodegenerative diseases is frequently characterized by impaired bolus preparation and propulsion, impaired mastication, reduced oral containment, oral residue, impaired tongue movement, impaired pharyngeal timing/coordination, pharyngeal residue, and penetration/aspiration [28]. While VFSSs remain the primary method for assessing swallow function in patients with neurodegenerative diseases, there are limitations to implementing instrumental swallow evaluations in patients with progressive, degenerative diseases [28, 29]. Because of their multifactorial health problems, physical mobility impairments, and transportation issues, it can be challenging for patients with neurodegenerative diseases to undergo VFSSs as outpatients at medical facilities. In addition to this, patients with neurodegenerative diseases are at increased risk of fatigue over the course of a meal and may have fluctuating swallow function day to day, which is a challenge to capture during short instrumental swallow evaluations [29]. Moreover, because of the progressive nature of neurodegenerative diseases, it is advantageous to monitor swallow function more closely over time in order to predict and mitigate adverse events that may occur secondary to progressing dysphagia such as aspiration pneumonia. Completing frequent instrumental swallow evaluations such as VFSSs or fiberoptic endoscopic evaluation of swallowing (FEES) to monitor swallowing throughout disease progression is costly, burdensome to patients and caregivers, and relatively invasive (e.g., exposure to radiation, uncomfortable) [29]. Among other patient populations, people with neurodegenerative diseases would benefit from a non-invasive, inexpensive, and easily transportable device to infer about swallow function using non-invasive methods such as HRCA because of the high prevalence of dysphagia and the variety of kinematic changes in swallow function that occur throughout disease progression. Therefore, this study investigated the ability of HRCA to broadly differentiate (i.e., screen) between swallows from healthy people and people with neurodegenerative diseases. We hypothesized that HRCA would accurately differentiate these two classes of swallows by identifying significant differences in vibratory and acoustic signal features between swallows from healthy people and from people within a single class of “people with neurodegenerative diseases.”

Methods

Equipment and Procedures

This study was approved by the Institutional Review Board at the University of Pittsburgh and all participants provided informed written consent. Data analysis for this study was conducted on two separate sets of data that were collected at two different timepoints in a similar fashion. The first data set consisted of 170 thin liquid swallows from 20 patients with various neurodegenerative diseases between the ages of 35–82 with a mean age of 61.25 (10 males). Diagnoses of neurodegenerative diseases included Parkinson’s disease (PD), myasthenia gravis, motoneuron disease, multiple sclerosis (MS), muscular dystrophy (MD), amyotrophic lateral sclerosis (ALS), myotonic dystrophy, and progressive muscle weakness unless otherwise specified. All patients underwent VFSSs at the University of Pittsburgh Medical Center Presbyterian hospital due to suspected dysphagia. Patients were imaged in the lateral plane. VFSSs on patients were completed as a part of their clinical care rather than for research purposes alone. For this reason, patients were examined under a variety of bolus volumes and consistencies and asked to perform compensatory maneuvers (i.e., chin tuck) as deemed appropriate based on clinical presentation of dysphagia. See Table 1 for the bolus characteristics for all swallows included in data analysis from the patient data for this study.

Table 1 Bolus characteristics for all swallows included in the neurodegenerative patient data set

The second data set consisted of 171 thin liquid swallows from 51 healthy community dwelling adults between the ages of 39–87 with a mean age of 67.21 (22 males). Inclusionary criteria for healthy community dwelling adults included no prior history of swallowing difficulties, neurological disorder, surgery to the head or neck region, or chance of being pregnant based on participant report. For healthy participants, data collection also occurred in the same institution under a separate IRB approval. Participants were imaged in the lateral plane. In contrast to the patients with neurodegenerative diseases, the healthy community dwelling adults underwent a standardized (i.e., five 3 mL boluses by spoon and five unmeasured self-selected “comfortable” cup sips in head neutral position) and short (average fluoro time of 0.66 min) VFSS procedure of ten thin liquid boluses administered in random order to minimize radiation exposure. For spoon presentations, the researcher instructed participants to “Hold the liquid in your mouth until I tell you to swallow it.” For cup presentations, the researcher instructed participants to “Take a comfortable sip of liquid and swallow it whenever you’re ready.” See Table 2 for the bolus characteristics for all swallows included in data analysis from the healthy community dwelling adults for this study. For the purposes of this study and to effectively compare between groups, only thin liquid swallows administered by cup and spoon were included for data analysis, because only thin liquid swallows were collected from the healthy community dwelling adults.

Table 2 Bolus characteristics for the swallows included in the healthy community dweller data set

A standard fluoroscopy system (Ultimax system, Toshiba, Tustin, CA for the patient data collection; and Precision 500D system, GE Healthcare, LLC, Waukesha, WI for the healthy community dwelling adult data collection) set at a continuous pulse rate of 30 PPS was used to obtain swallowing video segments. To capture the raw videos directly from the x-ray apparatus at a rate of 60 or 73 frames per second, we used a frame grabber module (AccuStream Express HD, Foresight Imaging, Chelmsford, MA). Once data collection was complete and prior to conducting kinematic analysis of swallowing, the videos were downsampled from 60 or 73 frames per second to 30 frames per second to get rid of the duplicate frames that were inserted into the videos due to the oversampling in the frame grabber necessary to align with the higher sampling rate of the signals acquisition system. This step produced accurate 30FPS videos for analysis. To obtain HRCA signals during concurrent VFSS, a tri-axial accelerometer (ADXL 327, Analog Devices, Norwood, Massachusetts) and contact microphone were placed on the anterior laryngeal framework at the level of the cricoid cartilage with tape. Prior to placing the non-invasive neck sensors on the anterior neck region of participants, researchers cleaned participants with alcohol pads. To ensure adequate signals were obtained from the sensors, the accelerometer and contact microphone were placed in custom casings to allow for flat contact surfaces with the skin. The accelerometer was placed at midline at the cricoid arch and the contact microphone was placed at the right of midline and inferior to the accelerometer in order to obtain the best x-ray images and signals and so as not to interfere with imaging of the upper airway. For each participant, we aligned the axes of the tri-axial accelerometer (anterior–posterior, superior-inferior, and medial–lateral) with the participant’s neck. The exact placement of the non-invasive neck sensors can be viewed in Fig. 1 [4, 30]. The accelerometer was powered by a power supply with a 3 V output (model 1504, BK Precision, Yorba Linda, California). Following data collection with the accelerometer, the raw signals were bandpass filtered (model P55, Grass Technologies, Warwick, Rhode Island) from 0.1 to 3000 Hz and amplified ten times. Then, the signal data from each accelerometer axis was entered into a data acquisition device (National Instruments 6210 DAQ) to be recorded at a sampling rate of 20 kHz using the Signal Express program within LabView (National Instruments, Austin, Texas). To overcome measurement errors and because multiple kinematic events occur simultaneously during swallowing, the signals were down sampled into 4 kHz prior to analysis.

Fig. 1
figure 1

Neck sensor placement during data collection

Kinematic swallow analyses: Before performing swallow segmentation, raters were trained and tested in swallow kinematic analyses. Intra- and inter-rater reliability were assessed with intra-class correlation coefficients (ICCs) [31] greater than 0.99 for both measures. VFSSs were segmented into individual swallows for analyses. The onset of the swallow was defined as the frame in which the bolus head passed the shadow of the ramus of the mandible, and the offset of the swallow was defined as the frame in which the hyoid returned to its lowest position after clearance of the bolus tail through the UES. Ongoing intra-rater reliability during swallow segmentation was completed to control for drift by having raters randomly select one out of ten swallows to re-analyze and compute ICCs. Inter-rater reliability for swallow segmentation was performed on 10% of swallows with ICCs of 0.99 or above for all trained raters. Since the purpose of this study was merely to determine whether there was a difference in HRCA signal features between swallows from healthy people and swallows from patients with neurodegenerative diseases, no swallow kinematic analyses were performed aside swallow segmentation.

Preprocessing and Feature Extraction from HRCA Signals

In order to reduce the multi-source noise associated with the vibratory and acoustic signals of HRCA, each component was filtered to remove the device noise. These filters were designed based on the output of each sensor when no input was present using an auto-regressive model. Head movement interference was removed using a fourth-order splines approximation algorithm [32, 33]. Any additional noise component that existed was removed using wavelet de-noising. This preprocessing procedure has previously demonstrated its effectiveness in many studies that investigated the use of HRCA signals in swallow kinematic analysis [15, 17, 21, 34]. Features that have proven to be significant to swallow kinematics and swallowing disorders based on previous research studies [15, 17, 21, 34] were then extracted from the HRCA signals in order to determine the association between HRCA signals and the diagnostic class (i.e., neurodegenerative disease) of the patient. A summary of the features used and the definition of each appears in Table 3.

Table 3 Summary of the features extracted from HRCA signals

Data Analysis

We fit a series of linear mixed models to examine the association between 36 different HRCA signal features, swallows from healthy people, and swallows from people with neurodegenerative diseases. Support vector machine (SVM), Naïve Bayes, logistic regression, and decision tree classifiers, which represent supervised machine learning techniques, were constructed to differentiate between swallows from patients with neurodegenerative diseases and swallows from healthy subjects based on either the entire set of features extracted from the HRCA signals or a subset that was proven statistically significant based on the results of the linear mixed models or a feature selection method. This yielded three training procedures for the used classifiers: the first procedure was performed through using the entire set of features extracted from HRCA signals (36 features); the second procedure used only the set of features that was proven significant by the statistical analysis (22 features); and he third procedure included training the classifiers after performing a principal component analysis (PCA) on the features which represent a feature selection method that only keeps the statistically independent features. SPSS (IBM, Armonk, NY) was used for fitting the linear models, while MATLAB (The MathWorks, Inc., Natick, MA) and R (The R Foundation) were used to build and evaluate the classifiers. The performance of each classifier was evaluated through a leave-one-out procedure. This procedure involves training the classifier with the whole set of swallows from both groups except for one swallow that is selected randomly to test if it is classified correctly and then the process is repeated until all swallows are included as a testing sample at least once. To determine whether the swallow is classified correctly, the labels from VFSS images are used as the “ground truth.” The accuracy, sensitivity, and specificity of classification between healthy and neurodegenerative disease swallows were calculated based on the number of correctly classified swallows during the evaluation process with respect to the complete set of swallows from both groups.

Results

Results from the linear mixed model revealed that 22 HRCA signal features extracted from the microphone and tri-axial accelerometer were statistically significant (p < 0.05) for predicting whether swallows were from healthy people or from patients with neurodegenerative diseases (See Table 4). Statistically significant HRCA signal features from microphone signals included: standard deviation, skew, centroid frequency, and bandwidth; from accelerometer anterior–posterior axis: standard deviation, centroid frequency, bandwidth, and wave entropy; from accelerometer superior-inferior axis: standard deviation, kurtosis, Lempel–ziv, entropy rate, centroid frequency, bandwidth, wave entropy; and from accelerometer medial–lateral axis: standard deviation, kurtosis, Lempel–ziv, entropy rate, centroid frequency, peak frequency, and bandwidth. Figures 2 and 3 show a density plot and a power spectral density plot from the HRCA microphone signals that demonstrate the differences in standard deviation and peak frequency between the swallows from a healthy person and a person with a neurodegenerative disease. Among the used classifiers, logistic regression and decision trees provided the best performance in comparison to SVM and Naïve Bayes with 99% accuracy, 100% sensitivity, and 99% specificity when using the full set of HRCA signal features (See Table 5).

Table 4 Summary of the statistically significant HRCA signal features associated with differentiating between swallows from healthy people and swallows from patients with neurodegenerative diseases
Fig. 2
figure 2

Density plot from the HRCA microphone signals showing the difference in standard deviation between the swallows from a healthy person and a person with a neurodegenerative disease

Fig. 3
figure 3

Power spectral density plot from the HRCA microphone signals showing the difference in peak frequency between the swallows from a healthy person and a person with a neurodegenerative disease

Table 5 Performance of classifiers used to differentiate between swallows from healthy people and swallows from patients with neurodegenerative diseases

Discussion

This study is the first study to date that has used HRCA to differentiate between healthy swallows and swallows from people in a category of underlying disease that commonly results in dysphagia. Since this is the first study to explore this, it will be important to replicate this study with a larger sample of people with neurodegenerative diseases and with additional patient populations. We found that HRCA combined with statistical methods and machine learning techniques could differentiate between swallows from healthy people and swallows from people with a variety of neurodegenerative diseases effectively with a high degree of accuracy. While the results do not characterize the nature of swallowing physiology that differed between the two groups, we accomplished our intended aim of providing a screening-level differentiation between “normal” and “neurodegenerative disease” swallows. While these preliminary results are promising, they do not by any means provide discrete diagnostic/physiologic information; therefore, it will be important to expand this work to gain insight into the underlying swallowing physiology that may contribute to statistically significant signal features between these two groups. However, the importance of identifying and differentiating a class of swallows that is distinctly different from “normal” swallows cannot be overstated, given the typical pattern of subclinical signs and symptoms during the early progression of neurodegenerative diseases and the fact that many such patients are not identified before clinically important dysphagia ensues. Characterizing the safety and efficiency of swallow function in patients with neurodegenerative diseases is important but challenging due to the multiple and heterogeneous disease-related factors which contribute to dysphagia, including weakness, spasticity, rigidity, motor unit deactivation, and atrophy of muscles secondary to motoneuron deterioration. Due to the heterogeneity and progressive nature of neurodegenerative diseases, there is a need for an individualized approach to dysphagia management with close monitoring of swallow function over time to maximize quality of life and to prevent adverse outcomes that can result in faster disease progression. A readily deployable and portable device using HRCA, which can non-invasively monitor and classify swallow function as disordered or not disordered within the variety of clinical settings occupied by these patients, and even in the home would be beneficial toward a goal of early identification and referral for many patients with neurodegenerative diseases.

Future studies should examine the ability of HRCA to characterize and distinguish swallows from healthy people and swallows from specific neurodegenerative diseases (e.g., ALS only) as well as other patient populations that have dysphagia, and the ability of HRCA to characterize swallows between various patient populations that have dysphagia (e.g., ALS vs. patients who have had a stroke) to determine whether HRCA may have diagnostic value. In addition to this, future work should refine HRCA methods to further characterize swallow function of specific patient populations to broadly differentiate between safe and unsafe swallows, and as a potential adjunct to dysphagia diagnostics, to quantify a variety of swallowing kinematic measurements such as hyoid bone displacement, and predict laryngeal vestibular closure and UES opening [4,5,6,7,8,9,10, 19,20,21, 34]. Other areas of potential interest would be examining the ability of HRCA signals to differentiate between dysphagia severity levels in specific patient populations based on swallowing safety and efficiency, as well as improvement or deterioration of swallowing function as a function of disease progression or treatment. The robustness of the machine learning algorithm used in this study should also be improved by including a larger variety of bolus consistencies and swallow conditions in future studies. Expanding upon the current study with this future work will result in more advanced and accurate non-invasive screening, and potentially, characterization of swallowing physiology across a variety of patient populations to more quickly and accurately identify and treat swallowing impairments when imaging instrumentation is temporarily unavailable or undesired by patients, or otherwise not feasible. Given clinician reliance on all available information, the addition of accurate and quantitative, non-invasively obtained data regarding swallow function will be a valuable adjunct to the screening process, and hopefully in the future to the diagnostic process, in all but the most ideal clinical situations in which all diagnostic methods are available.

Limitations

The main purpose of this study was to broadly characterize and classify swallows between two groups using HRCA rather than to characterize swallow function based on bolus or swallowing conditions during VFSSs. While we included only single thin liquid swallows for data analysis between the two groups, it is important to note that the data collection methods for patients with neurodegenerative diseases were consistent with clinical care, while the data collection methods for the healthy community dwelling adults were consistent with a standardized research VFSS protocol. Each data collection method has strengths and limitations: methods consistent with clinical care result in improved generalizability and real-world application (external validity), while methods that follow a strict research protocol result in increased internal validity. Another limitation of this study was the heterogenous group of patients with neurodegenerative diseases. Due to the small sample size of individuals with neurodegenerative diseases within our database, we included a variety of diseases within this classification category. While the presentation and severity of dysphagia may vary across these diseases, the ability of the machine learning algorithm to differentiate between healthy swallows and swallows from people with a variety of neurodegenerative diseases with a high degree of accuracy, sensitivity, and specificity demonstrates the robustness of the machine learning algorithm. For this study, we included a relatively large sample of swallows within each group (170). However, it will be important to test the accuracy of this algorithm on larger data sets that consist of the same, and different bolus textures and volumes, swallows from individual neurodegenerative diseases (e.g., ALS only) and swallows from other diseases that result in dysphagia.

Conclusion

This study found that HRCA signal features combined with statistical methods and machine learning techniques could predict whether swallows were from healthy people or from patients with neurodegenerative diseases with a high degree of accuracy (99%), sensitivity (100%), and specificity (99%). These results provide preliminary evidence that HRCA may be a beneficial method to further explore in future studies to determine whether it can be used to characterize swallows between different patient populations and to characterize whether non-invasive data collected during swallows exhibit evidence of impairment when imaging is not available or feasible. The ability to differentiate between swallows from different patient populations combined with the ability to non-invasively differentiate between safe and unsafe swallows and predict swallow kinematic events would make HRCA a useful dysphagia screening method with future potential to be a diagnostic adjunct to instrumental swallowing evaluations.