Cervical auscultation (CA) is becoming an increasingly researched methodology for the evaluation of dysphagia in adults and children. CA has been available since the invention of the stethoscope. However, its clinical acceptance and application have been limited by subjective, verbal descriptions of swallow-associated sounds and nonstandardized techniques. Our efforts in this field have focused on decreasing the subjectivity of the analysis of CA data by providing objective, measurable parameters that are reproducible across populations.

We previously studied the stability of swallow-associated sounds in preterm infants with and without bronchopulmonary dysplasia (BPD) using a technique that we have labeled the variance index or VI [1, 2]. This method used CA data collected with an accelerometer placed over an infant’s throat during feeding. Using this technique we were able to show developmental differences between low-risk preterm infants and infants affected by BPD. Specifically, it was shown that the swallow-associated sounds of low-risk preterm infants become more uniform with advancing postmenstrual age (PMA) and that this developmental progression does not occur in infants affected by BPD. However, no data are available on the stability of swallow-associated sounds in term infants or adults. Furthermore, there remains disagreement among investigators as to the proper acoustic detection device (microphone or accelerometer) to use for CA studies.

Method

Twenty healthy adult volunteers, ten from each of two research sites, were selected to participate in this project. None of the participants had a history of head or neck surgery and no skin disease was present in the cervical area of any of the participants. Informed consent was obtained from all study participants. This study was approved by the Institutional Review Board (IRB) of the University of Maryland Medical System (UMMS) and the University of Kentucky (UK) and complied with all applicable HIPAA standards. Swallowing studies were carried out in a quiet administrative office setting.

Swallow-associated sounds were collected with both an accelerometer (Vibro-meter Corp., Boston MA, Model 501-FB) and an electret microphone (Optimus [RadioShack/Tandy Corp], Fort Worth TX, Model 33–3013 at UMMS, Audio-technica, Stow OH, Model ATR35s) at the University of Kentucky. Figure 1 shows a schematic representation of the configuration of the study subject and equipment. The microphone was configured in a custom assembly by cutting the tubing of a standard neonatal stethoscope approximately 1.5 in. from the head assembly. The microphone condenser was connected to the cut end of the tubing with heavy surgical tubing to allow collection of the acoustic signals from the diaphragm side of the stethoscope. The stethoscope head and the accelerometer were fixed with double-sided adhesive tape to the right side of the subject’s neck over the lateral border of the trachea immediately inferior to the cricoid cartilage. This site has previously been shown to be the optimal location for detection of swallowing sounds in adults [3]. The diaphragm side of the stethoscope was used as it provides a larger surface area for adhesion of the tape to the subject’s neck. The two acoustic detection devices were connected to an analog multitrack master cassette tape recorder (Tascam/TEAC Professional Division, Montebello CA, Model 414MKII) to allow precise alignment with wide-enough separation to prevent “bleed-through” of the signals. The microphone and accelerometer were calibrated by applying them to one of the coinvestigators and adjusting the input level such that the signals had the same relative amplitude during swallowing. Analog data were converted to digital format by recording the multitrack output to a digital recorder (Nomad Jukebox, Creative Technology Ltd, Milpitas, CA) at a sample rate of 44.1 kHz. Previous studies of swallow-associated acoustic signals have used sample rates as low as 16 kHz, allowing reliable analysis of signals up to 8 kHz. Because of advances in technology, we can now handle acoustic data files with CD quality sound, well above the sample rates that Nyquist’s Theorem would limit. The digital files were transferred to a laptop computer as PCM.wav formatted files for signal analysis. At the University of Kentucky (UK), because of advances in technology the analog files were transferred directly to the laptop computer without the additional digital recorder. This change in the way the files were handled did not change the acoustic or accelerometric data and did not seem to impact the results.

Fig. 1
figure 1

Schematic of equipment used in the study. The microphone and accelerometer are fixed side-by-side on the volunteer’s neck. Both devices are attached to a multitrack recorder to maintain temporal alignment between the channels. The analog output was digitized and stored on a computer for analysis. (a) Method used at UMMS including the digital recorder. (b) Method used at UK in which the analog files were converted directly to digital format. This change did not affect the data or results

After fixing the acoustic detectors to the study subject, the participant was asked to swallow while ingesting three different items. First, the patient drank a thin liquid. Eleven participants drank water, 6 drank coffee, and 3 had soft drinks. Next, the participant was asked to eat a pureed item, applesauce (Mott’s Natural Style Unsweetened, Motts Inc., Stamford, CT). The participant was asked to fill the mouth with several spoonfuls of applesauce then to swallow the bolus. The final item was a solid food, a cookie. Subjects were given their choice from several types of cookies. Participants at UK were offered only one type. The participant was instructed to place the cookie in the mouth, chew it thoroughly, and then to swallow. Swallowing sounds were recorded for each bolus consistency for each participant.

Recorded swallow-associated sounds were analyzed with a combination of available software packages. Time envelope data were displayed with Cool Edit 2000 v1.1 (Syntrillium Software Inc., Phoenix, AZ). Swallow-associated sounds were analyzed according to their time envelope and frequency analysis. Often, it was convenient to convert the PCM.wav formatted data to ASCII text and view items in a spreadsheet format (Excel, Microsoft Inc., Redmond, WA) for display and analysis. The durations of signals associated with the entire swallow, as well as with the components of the swallow, were compared between the accelerometric and acoustic data.

Initial discrete sounds (IDS), signals associated with the onset of pressure increase in the pharynx during swallow, were identified in the adult accelerometric data by their morphology and location at the onset of a swallow-associated signal. The VI of the IDS signals was calculated with the method described in a previous publication [2]. Briefly, the IDS were “cut and pasted” to a new file, which was converted to ASCII text for display in spreadsheet format. The IDS were scaled to allow comparisons between individuals and aligned by the major deflection. The variance at each point on the waveform was calculated and the average variance over the entire waveform was computed. The VI of the IDS signals from the adults was compared to the previously studied control group of low-risk preterm infants [1, 2]. Statistical analysis was performed using Sigma Stat 2.0 (Jandel, San Rafael, CA). Data concerning the duration of swallow components were normally distributed and were analyzed accordingly with parametric methods. Data concerning the VI for the swallow components were not normally distributed and, therefore, analyzed with nonparametric methods.

Results

The adult study participants included 20 individuals with no history of swallowing disorder. There were 10 males and 10 females with ages ranging from 26 to 59 years. The average age of the male participants was 37.1 years (standard deviation [SD] = 9.2) and for female participants it was 39.6 years (SD = 10.9; p = ns [not significant]).

Swallow-associated Signals of Adults

Figure 2 shows a comparison of the accelerometric and acoustic data from a single swallow of thin liquids from study subject 1. The characteristics of the signals are clearly different. The vibratory characteristics of the acoustic signals collected by the microphone become very evident when compared with those from the accelerometric channel. Data were collected regarding the duration of the entire swallow, duration of the IDS, and duration of passage for each bolus consistency (liquid, puree, solid). Comparisons were made between the accelerometric and acoustic durations of the signals. Previous authors have shown that carbonated beverages have different swallow-associated properties than other thin liquids [4]. We do not have enough subjects in this study to explore these differences. Nevertheless, given our small numbers, there was no significant difference between the durations of the signals associated with the entire swallow or bolus passage as measured by either the accelerometer or microphone for each consistency. There was no significant difference between the durations of the signals associated with the entire swallow or bolus passage as measured by either the accelerometer or microphone for each consistency. There was a pattern of longer swallow duration with increasing viscosity of the bolus recorded with both the accelerometer [mean = 0.463 s (liquid), 0.501 s (pureed), and 0.614 s (solid)] and the microphone [mean = 0.418 s (liquid), 0.467 s (pureed), 0.600 s (solid)]. The difference was statistically significant for liquids vs. solids (p < 0.05) for both the accelerometer and the microphone (ANOVA, post hoc Tukey test). This pattern was also noted in the signal associated with bolus passage for each detection device. The mean duration of swallow-associated signals (total swallow and bolus passage) for each acoustic detection device are shown on Table 1.

Fig. 2
figure 2

Diagram of accelerometric and acoustic data for a single swallow. IDS signals have a predictable morphology and temporal relationship to the onset of swallow. The microphone signals are acoustic and vibratory in nature. The accelerometric data included inaudible signals associated with acceleration and motion in addition to the audible signals

Table 1 Duration of swallow-associated signals

Variance Index of Adult and Infant Swallows

The Variance Index (VI) was calculated as described above. The median (data not normally distributed) VI of adults swallowing liquid was 29.1 (24.1, 36.6) [median (25%, 75%)]. For pureed foods it was 45.6 (27.9, 55.3), and for solids it was 49.6 (26.0, 65.4) (Fig. 3a). Although there appears to be a trend toward less variability in the liquid swallow-associated sounds, the difference is not statistically significant (p = 0.130) (Kruskal-Wallis ANOVA on Ranks).

Fig. 3
figure 3

VI for adults swallowing each type of bolus and preterm infants feeding. (a) Box plot of adult data showing VI changes with varying consistency of bolus. Although there is a trend toward more variability with increasing viscosity, the difference is not statistically significant. (b) Box plot of adult liquid and preterm infants feeding. The VI for infants feeding before 36 weeks PMA is greater than that for both the adults swallowing liquids and infants feeding after 36 weeks PMA. There is no difference between the adults and infants after 36 weeks PMA

The adult VI data was compared to that of the previously studied preterm infant group, which consisted of 12 studies performed on ten low-risk preterm infants defined as not being on oxygen, having no BPD, no grade III or IV intraventricular hemorrhage, no maternal drug use, no concurrent sepsis, and no craniofacial anomalies. Feeding studies in the preterm infants were divided into two groups by PMA, with five studies occurring before 36 weeks PMA and seven studies occurring after 36 weeks PMA. Because all of the participating infants were fed only breast milk or formula, the VI of the preterm group was compared only to the VI of adults swallowing liquids. The VI for infants fed before 36 weeks PMA was 49.0 (46.4, 51.1) [median (25%, 75%); for infants after 36 weeks it was 36.3 (33.4, 41.9) (Fig. 3b). The VI for low-risk preterm infants fed before 36 weeks was statistically higher than the VI for both the infants fed after 36 weeks (p = 0.008) and for adults swallowing liquids (p = 0.001) (Kruskal-Wallis ANOVA on Ranks, post hoc Dunn’s method). There was no significant difference between the VI for adults swallowing liquids and the VI for preterm infants fed after 36 weeks PMA.

Discussion

CA is becoming an increasingly well-researched tool for the study of dysphagia in adults, children, and infants. In 1995, CA was described as an “imprecise clinical method” [5]. However, advances in technology, particularly the ability to couple clinical CA with advanced digital signal processing techniques, have led to great improvements in the study of dysphagia. CA is inexpensive, noninvasive, and easy to use [6]. Cervical sounds of adult and infant swallowing have been qualitatively described [7, 8]. CA has been used to describe infant cry [9, 10], pharyngeal action during birth [11], and cry and feeding of infants with cleft palate [12]. CA with digital signal processing has been used to identify discrete sounds, called initial discrete sounds (IDSs) and final discrete sounds (FDSs), which are consistently associated with swallow in healthy term infants [13, 14]. The developmental progression of the IDS waveform in healthy preterm infants and infants with bronchopulmonary dysplasia has been identified [1, 2]. An optimal placement of the acoustic detection device and anchoring mechanism for CA during swallowing has been identified for adult CA studies [3]. However, there is disagreement among investigators about whether a microphone or an accelerometer is a better acoustic detection device to use for CA [3, 15].

Microphones have a frequency response curve that is unique to each microphone. This means that the signals collected by a microphone will be amplified at some frequencies and attenuated or even not detected at others. Figure 4a shows an example of a typical frequency response curve for a microphone. At frequencies where the frequency response curve is greater than zero, the signal is artificially amplified. This is advantageous as long as the target signal is in the frequency range where the curve is above zero. For most microphones, the frequency response curve is positive in the range of audible human hearing (2–20 kHz). At frequencies at which the curve is less than zero, the signal is attenuated or not detected. Microphones can be very inexpensive, as little as $10 depending on the type and brand selected. The microphones we used for this study cost about $30 each. The signals collected by a microphone are vibrational. Authors who favor the use of the microphone have cited cost as a major reason for their use in the clinical setting [15].

Fig. 4
figure 4

(a) Frequency response curve of the microphone used in this project. (b) Frequency response curve of the accelerometer

Accelerometers have a flat frequency response, which means the signals are collected equally well across the entire frequency spectrum. Figure 4b shows the frequency response curve for an accelerometer. Accelerometers are significantly more expensive than microphones. The accelerometer we used for this study cost about $800, but costs in the $1000s for the accelerometer and related peripheral equipment are not uncommon. The signals collected by an accelerometer include motion, attenuated audible sound, and vibrations generated below the surface to which the accelerometer is attached. This may allow a more complete interrogation of the data associated with the target event, in this case, a swallow. Authors who favor the use of the accelerometer have cited the wide, flat area of the frequency response curve and variety of signals collected as reasons to support their use in research [13, 13, 14].

Previous authors have addressed the issue of microphones versus accelerometers with differing results. Takahashi et al. [3], found that an accelerometer fixed to the skin with double-sided paper tape maintained a flat frequency response and had enough adhesiveness to hold the detector in place. Cichero and Murdoch [15] presented data in favor of the use of microphones and disagreed with the results of the previous study. They cited what they considered flaws in the method, specifically in the calibration of the accelerometer and microphones. They also had concerns regarding the airborne-noise-rejecting characteristics of the detectors used. Both studies used sequential rather than simultaneous testing of the two different detectors.

Some authors have tried to minimize the potential of CA as a research and clinical tool [16]. They point to the fact that experts in the field disagree over basic terminology and concepts used in CA. CA remains unstandardized and fraught with subjective interpretation of swallow sounds. Correlation of swallow-associated signals with physiologic events remains elusive. Previous authors have attempted to correlate swallow-associated signals with endoscopic or fluorographic video collection techniques. However, due to differences in maximum sample rates available between acoustic and video signals, it is impossible to accurately align acoustic data sampled at 44 kHz with video data collected at 30, 60, or even 100s of Hz. Despite these limitations the authors concluded that CA should not be pursued as a valid research or clinical technique.

We see it differently. CA as a science instead of an art is in its infancy. Our continued efforts in this field focus on providing objective, measurable, and reproducible techniques to increase the acceptance and utility of CA in research and clinical applications. To that end, this article describes what is to our knowledge the first head-to-head real-time comparison of accelerometers and microphones for the detection of swallow-associated signals. Furthermore, this work provides more information on the stability of swallow-associated signals into adulthood. Advances in computer technology have allowed us to improve on previous techniques and analyze more complex signals and large files (higher sample rates).

Our results show that the signals collected by the accelerometer and microphone are clearly different. The microphone collects vibratory acoustic data and artificially amplifies or attenuates signals in certain frequency ranges. The accelerometer collects signals that include motion, attenuated audible sounds, and pressure changes with equal effectiveness across the entire frequency spectrum. The fact that the durations of the swallow-associated signals collected during this real-time comparison study are quite similar supports the concept that the differing acoustic and accelerometric signals are being generated by the same events in and around the pharynx. Our data also show a trend toward longer durations of swallow-associated signals with boluses of increasing viscosity, confirming previous work by others [17] which suggested a swallow duration of 2–4 s for semisolid foods versus 1 s for thin liquids.

We chose to further analyze, in particular, the IDS signals of the adult swallows because we have previously established certain characteristics of the IDS signal in infants, which have provided a robust parameter for measuring the stability of infant feeding. Vice et al. [13, 14] used cervical auscultation with an accelerometer to evaluate rhythmic suckle feeding in newborn infants. They identified discrete and reproducible swallow-associated signals that they called initial discrete sounds (IDS) and final discrete sounds (FDS) in recognition of their temporal relationship to pharyngeal pressure change during swallow. IDS occur at the onset of pressure change in the pharynx. The IDS waveform consists of an initial prominent deflection with a very rapid rise time in either the positive or the negative direction, graduating into activity of lower amplitude and frequency as the tracing returns to baseline. The FDS signal is less predictable in its morphology and temporal relationship to pharyngeal pressure change. We have not yet investigated the frequency, amplitude, or power of these signals. Work that will allow us to analyze these characteristics is underway in our lab.

We have used the same data collection technique to analyze the development of the stability of the IDS waveform in preterm infants with and without BPD [1, 2]. By calculating the average variance, or VI, for multiple IDS signals from a feeding event, IDS waveforms were shown to become more uniform with increasing PMA. This developmental progression was not demonstrated in infants affected by BPD. The mechanism of this unmature pattern of development is as yet unknown, but it has been repeatedly shown with other measures of rhythmic suckle feeding [1820]. Gewolb et al. [20] used a modification of the VI method to analyze other waveforms of biometric data collected during infant feeding; specifically, the simultaneous analysis of the integration of suck and swallow rhythms via plotting of pressure waveforms on X–Y axes.

In our studies of newborn infant feeding, we had found that a nasopharyngeal catheter was required to reliably locate and identify swallow-associated sounds picked up by the accelerometer. This was especially true of the signals associated with specific components of pharyngeal pressure changes. However, in this study it was not feasible to place a nasopharyngeal catheter in the adult volunteers. We found this did not limit our study because the individual volunteers were able to reliably identify the number of swallows they had used to completely clear the bolus and the morphology of the IDS signal was conserved between the infants and adults and thus was easy to identify.

In this study we have built on our previous work by showing that not only does the IDS waveform of low-risk preterm infants become more uniform with advancing PMA, but that as the infants approach “term” PMA, the overall stability of the waveform approaches that of healthy adults. We have identified the same components in adult swallows that were present in the previous study of rhythmic suckle-feeding infants. Of particular interest is the fact that the morphology of the IDS signal in adult swallows is similar to that of the previously studied infants. Up to now, we have limited our discussions of the progression of the IDS waveforms to descriptions of the stability of the waveform without providing a qualitative assessment of the overall infant feeding. This study provides an important bridge between early infant feeding and overall development by establishing that the low-risk infants are in fact getting “better” in that the stability of the swallow-associated signals is becoming more like that of adults. This information may ultimately be used to develop tools to study a potential link between early feeding difficulties and long-term neurologic outcome.

Conclusion

In conclusion, this work has provided the first head-to-head, real-time comparison of two acoustic detection devices and has described in detail the signals collected from both. From this information we can conclude that both the accelerometer and the microphone may be adequate, depending on the desired application. We suggest that for clinical auditioning of cervical sounds associated with swallow, a microphone and speaker may be all that is required. However, for more complex signal analysis, including signals that are not vibratory in nature and would not be detected by a microphone, an accelerometer should be considered.

We have increased our understanding of the progression of IDS signals in low-risk preterm infants by showing that these infants are approaching adult norms as they age. Because we have now established that these infants are actually improving, we can begin to analyze the link between early feeding problems and long-term developmental outcomes. Work is currently underway to study the progression of IDS waveforms as well as other parameters of feeding stability in healthy term infants and in infants affected by other pathologic conditions.