Introduction

Swallowing is a complicated function that involves the bolus being propelled from the mouth to the stomach. It requires the coordination of 25 pairs of muscles of the oral cavity, pharynx, larynx and esophagus; thus, it is one of the more complex functions of the body [1].

Different techniques are used in the diagnostic investigation of swallowing disorders. Videofluorography is considered the gold standard, but it is an invasive method that also exposes the subject to radiation [2]. Fiberoptic endoscopy has been proposed as a routine minimally effective method [3]. Endoluminal sonography [4] and manometry can be used, but both require insertion of a probe that, in certain conditions, can alter bolus flow. Other methods such as high-resolution manometry [5], electromyography [6], and kinetic magnetic resonance imaging [7] are currently under evaluation in specialized centers. However, cervical auscultation has not been used for routine assessment of swallowing.

Several research teams have shown that the noise made during swallowing carries information that could be used for noninvasive analysis of swallowing [8, 9]. In 1990, Hamlet et al. [10] coupled acoustic recordings to simultaneous videofluoroscopy with the aim of determining the source of these sounds. One acoustic study showed that increasing the volume of the ingested bolus lengthened the overall duration of the sound, whereas increasing the consistency of the bolus decreased the duration [11]. Acoustic analysis also revealed there to be several sound components (SC), including three main components [12]. In a previous study, we coupled the sound of swallowing with videofluoroscopy images. The three sound components could thereby be linked to different stages of the swallowing process [13]. The first component (SC1) was associated with the rise of the larynx, the second (SC2) with the passage of the bolus through the superior esophageal sphincter (SES), and the third (SC3) occurred during the descent and the opening of the larynx. We have also demonstrated that the acoustic signal can be modified after surgery on the laryngeal tract [14].

By reviewing the literature, we found evidence that there are differences in the structure of the pharyngeal swallow according to the volume and consistency of the ingested bolus. As the volume swallowed increases, the length and diameter of the opening of the SES also increases [2, 15]. The same applies to the duration and amplitude of the laryngeal elevation, as measured from the displacement of the hyoid bone [16]. Moreover, as the bolus consistency becomes denser, both laryngeal elevation and the duration of the SES opening decrease and the diameter of the SES aperture increases [17]. Boiron et al. [11], and more recently Eyigör et al. [18], showed that the overall duration of swallowing sounds increases with the volume of the bolus and decreases with consistency.

There has been no previous study of the differences in swallowing sound components according to bolus type and volume. Our hypothesis was that if there were differences in the structure of the pharyngeal swallow according to the volume and consistency of the ingested bolus, this would be reflected in the features of the three swallowing sound components. The main objective of this research was to describe the variations of pharyngeal swallowing sound components (number of components, duration, and intervals) according to the volume and consistency of the bolus swallowed. The secondary objective was to identify consistencies and volumes generating the most informative acoustic signals for clinical investigations of the sounds associated with swallowing.

Materials and Methods

Participants

We enrolled 23 volunteers (10 men and 13 women; average age = 28 ± 10 years; maximum age = 59 years; minimum age = 20 years). We obtained informed consent from each subject after explaining the goal of the study and the risks involved, in accordance with current French legislation. Each volunteer’s medical history was recorded, and in particular we verified that none had a history of any swallowing disorder.

Bolus

Boluses of three different textures were studied: flat water at room temperature (dynamic viscosity measured by the Brookfield method: 1 mPa s), unsweetened yogurt (Danone®, dynamic viscosity measured by the Brookfield method: 300 mPa s), and reconstituted mashed potato (Mousline®, 125 g in 250 ml of milk and 500 ml of water at 30 °C; dynamic viscosity measured by the Brookfield method: 50,000 mPa s). Each subject swallowed 2-, 5-, and 10-ml samples, three times, of each of the three consistencies (a total of 27 swallows). Each bolus was prepared in a syringe and then transferred to the buccal cavity. Patients were asked to swallow it all at once when requested by the examiner during the recording. We allowed a minimum of 30 s between each swallow. For the comfort of the subjects, they fasted for at least 2 h before these recording sessions.

Acquisition Material and Analysis

Each study subject was placed in a seated position. The cervical recording was obtained with a microphone (Electret tie clip microphone, 50–18,000 Hz; Sony, Japan) positioned on the skin on the right side in front of the posteroinferior border of the cricoid cartilage. The microphone was held in place by a perforated elastic band. The microphone was connected to the sound card of a portable computer.

All recordings were analyzed by a single investigator using Cool Edit Pro software (Syntrillium Software Corporation, Phoenix, AZ, USA). The swallowing sounds were analyzed by first listening to all the recordings taken from an individual subject while looking at the displacement of the signal’s cursor. All recordings for which the acoustic swallowing signal was not visualized in the noise signal or not clearly heard were discarded. To analyze the entire swallowing sound from its start, we placed the first marker at the point at which the signal diverged from baseline (the initial deflection point), then we studied the number of SCs in each swallowing sound. To analyze the sound components (SC) within the swallowing sound, we similarly placed a marker at the beginning and the end of each SC. We called the interval between the end of a SC and the beginning of the next SC “IT.” The SCs were classified as first SC (SC1), second SC (SC2), and third SC (SC3). Consequently, there were two intervals: IT1 between SC1 and SC2 and IT2 between SC2 and SC3.

Parameters (Fig. 1)

We calculated the percentage of all recordings that contained each SC. For each recorded sound, the total duration of the sound, the duration of each SC, and the intervals (IT1 and IT2) between the SCs were measured. For all records, the average duration of acoustic features was calculated and compared according to the volume and the consistency of the bolus swallowed. We also compared measures between men and women.

Fig. 1
figure 1

Example of acoustic recording analyzed using Cool Edit Pro software and the physiologic correlation of each sound component. SC sound component, td total duration of the sound, IT interval

Statistics

Normality of distribution was verified by a Kolmogorov–Smirnov test and homogeneity of variances was verified by a Levene test for all parameters. Analysis of total duration with respect to the consistency and the volume of the bolus was carried out by one-factor (gender) repeated-measures ANOVA [3 (consistency: water; yogurt; mashed) ×3 (volume: 10; 5; 3 ml)], corrected by the Greenhouse-Geiser test and followed by the LSD-Fisher post hoc analysis to compare paired averages. Analysis of SC duration with respect to the consistency and the volume of the bolus was carried out by one-factor (gender) repeated-measures ANOVA [3 (consistency: water; yogurt; mashed) ×3 (volume: 10; 5; 3 ml) ×3 (SC duration: SC1; SC2; SC3)], corrected by the Greenhouse-Geiser test and followed by the LSD-Fisher post hoc analysis to compare paired averages. Analysis of the SC interval with respect to the consistency and the volume of the bolus was carried out by one-factor (gender) repeated-measures ANOVA [3 (consistency: water; yogurt; mashed) ×3 (volume: 10; 5; 3 ml) ×2 (interval: IT1; IT2)], corrected by the Greenhouse-Geiser test and followed by the LSD-Fisher post hoc analysis to compare paired averages. If no effect of gender was found, analyses were continued without this factor.

Results

Population

Twenty-three subjects were included in this study and there were 621 acoustic recordings. For reasons of the development of the acquisition technique (see the Discussion section), 13 % of the recordings were not included in the analysis, leaving 540 records that were. The mean age of the subjects was 28 ± 10 years, the mean weight was 61 ± 10 kg, the mean height was 169 ± 10 cm, and the mean body mass index (BMI) was 21 ± 2 kg/m2 (Table 1). Both weight and height were significantly different between women and men (p < 0.0001 for both tests). There was no significant difference between the two groups for age or BMI.

Table 1 Comparison between men and women for age, weight, height, and BMI

Acoustic Results

No significant difference was found between men and women for any of the measures studied (total duration: F 1,18 = 0.11; p = 0.74; SC duration: F 1,7 = 0.03; p = 0.86; interval duration: F 1,7 = 1.15; p = 0.32). The percentages of the recordings containing the various SCs were as follows: 100 % for SC2, 81 % for SC1, and 77 % for SC3. These values were independent of bolus type.

For the total duration of the sound, there was a significant effect from the bolus volume (F 2,38 = 9.11; p < 0.0001), with a significant difference between boluses of 10 ml and those of 5 and 3 ml (p = 0.018, 10 vs. 5 ml for water; p = 0.008, 10 vs. 3 ml for water; p = 0.018, 10 vs. 5 ml for yogurt; p = 0.001, 10 vs. 3 ml for yogurt; p = 0.04, 10 vs. 5 ml for mashed potato; p = 0.001, 10 vs. 3 ml for mashed potato). The mean total duration of the sound was 515 ± 217 ms for boluses of 10 ml, 441 ± 150 for boluses of 5 ml, and 411 ± 155 for boluses of 3 ml (Fig. 2). No effect of the consistency (F 2,38 = 0.54; p = 0.59) was found and no interaction between the consistency and the volume was found (F 4,76 = 0.96; p = 0.9).

Fig. 2
figure 2

Mean total duration of sound in milliseconds according to the volume and type of bolus swallowed

The duration of SC2 increased with bolus volume, independent of the consistency of the bolus (p = 0.0002, 10 vs. 5 ml for water; p = 0.02, 5 vs. 3 ml for water; p = 0.009, 10 vs. 5 ml for yogurt; p = 0.04, 5 vs. 3 ml for yogurt; p = 0.046, 10 vs. 5 ml for mashed potato; p = 0.05, 5 vs. 3 ml for mashed potato). Similarly, the duration of SC2 was significantly greater for the 10-ml mashed potato bolus than for the 10-ml water or yogurt bolus (p = 0.01 and 0.03, respectively). There was no detectable difference between yogurt and water whatever the bolus volume or between mashed potato and the two other textures for boluses of 5 and 3 ml (Fig. 3).

Fig. 3
figure 3

Mean duration of the second sound component (SC2) in milliseconds according to the volume and viscosity of the bolus swallowed

The values of SC1 and SC3 and the intervals IT1 and IT2 did not differ significantly with respect to the bolus.

Discussion

We found that the volume and consistency of the bolus swallowed affected the acoustic parameters. In particular, the total duration of sound increased with increasing volume and was longer for the foodstuff with the thickest consistency (mashed potato). We confirmed that SC2 is the sound component that is always present in the sound of swallowing and that the duration of SC2 depends on the bolus volume and consistency.

Development of the Acoustic Technique

The acoustic recording technique used was the same as that developed by Morinière et al. [13]. The position of the microphone was that defined by Takahashi et al. [19]. However, we encountered difficulties with the first three subjects recorded so that the recordings could not be used because of their poor quality and the presence of many sound artifacts associated with movement of the microphone during swallowing. We therefore mounted the microphone on a stethoscope chest piece, as described by Boiron et al. [20], and the resulting recordings were of satisfactory quality and reproducible.

Limitations of the Technique

Analysis of acoustic data requires 1 h per subject. This is too long for this technique to be of routine use. We are working on the development of automated analysis software to decrease this duration. Cervical auscultation allows only exploration of the pharyngeal time (second time) of swallowing, which comprises three. It is not enough to ensure a complete study of swallowing disorders. The absence of morphological data is also a limiting factor of the technique.

Number of Subjects

Thirteen women and ten men were included in the study, allowing comparison between sexes; indeed, there was no significant difference between these two groups with respect to age or BMI. The mean age of the patients included in the analysis was 28 ± 10 years. It was therefore a young population and informative age stratification was not possible.

Bolus Volume and Consistency

We chose bolus volumes of 5 and 10 ml because these volumes have been used in many acoustic studies of swallowing. A bolus volume of 3 ml was also used to establish reference values for a volume that could be used with patients with diseases and who may be able to swallow only small amounts. Reconstituted dried mashed potato, yogurt, and water were used to constitute boluses of different consistency. They provided a range of textures that could be used on a regular basis and that do not pose a swallowing problem in patients. In addition, water is frequently used in acoustic studies, yogurt was used by Boiron et al. [11], and mashed potato by Youmans et al. [21], which gave us several points for comparison. Each subject swallowed three boluses of each volume and each consistency, as in previous published acoustic studies. Morinière et al. [13] estimated that five records are necessary for the rigorous characterization of the sounds. Our analysis of the first patients showed that the acoustic recordings were of good quality and reproducible; we therefore decided that three recordings for each volume and consistency were sufficient. Boluses were administered by syringe because this allowed appropriate mixing and control of the volume administered. This approach was also used by Boiron et al. [11], Morinière et al. [13], and Youmans and Stierwalt [21]. However, this approach has the disadvantage of the small diameter of the syringe’s nozzle, which changes the consistency of the foodstuff administered, in some cases making them more fluid. Eyigör et al. [18] used a graduated glass or spoon to avoid this disadvantage.

Acoustic Results

The percentages of recordings that contained the various sound components were as follows: 81 % for SC1, 100 % for SC2, and 77 % for SC3. The bolus type had no influence on these values. These results are comparable to those reported by Morinière et al. [14] (81 % SC1, 100 % for SC2, and 81 % for SC3). These results can be explained by the differences in duration and intensity of the SC. Indeed, as described by Morinière et al. [14], SC2 is the longest sound component and the more intense one, while SC3 is shorter and less intense. This is why SC1 and SC3 may be present but undetected by the examiner due to their short duration or intensity.

The mean total duration of sound was 515.04 ± 217 ms for a bolus of 10 ml, 441 ± 150 ms for a bolus of 5 ml, and 411 ± 155 ms for a bolus of 3 ml. These times are consistent with the values reported by Takahashi et al. [22] and Cichero et al. [23] but are significantly lower than those found by Perlman et al. [24], Boiron et al. [11], Morinière et al. [13], and Youmans and Stierwalt [21] (Table 2). Youmans and Stierwalt [21] suggested that these differences are a consequence of the ages of the populations tested; indeed, the sound of swallowing lasts longer in older subjects [18, 21].

Table 2 Total duration of the swallowing sound according to the volume and consistency of the bolus reported in the literature

Like Boiron et al. [11] and Youmans and Stierwalt [21], we found that increasing the volume swallowed led to an increase in the duration of the sound: the sound of swallowing lasted longer for boluses of 10 ml than for 5- and 3-ml boluses. The absence of difference between the 5- and 3-ml boluses may be due to the small difference in volume between the two.

We did not find any difference in the total duration of the sound with respect to the consistency of the bolus, unlike the findings reported in the two studies cited above [11, 21]. The absence of difference may have been due to the mode of administration of the bolus. Indeed, as described above, mashed potato and yogurt may have been made more fluid by passage through the syringe such that its consistency became similar to that of water.

We found that the total duration of the sound was not significantly higher for men than for women. This result is in agreement with the study by Cichero et al. [23] which involved swallowing juice, and the study of Youmans and Stierwalt [21]. By contrast, Lebel et al. [25] and Takahashi et al. [22] found that the duration was shorter for women than men using volumes of water of 13 and 5 ml.

The duration of SC2 increased with bolus volume and for all three bolus consistencies. The duration of SC2 was significantly longer for 10 ml of mashed potato than for 10 ml of water or yogurt. These results are consistent with those for the total duration of the sound in the studies cited above. SC2 corresponds to the passage of the bolus through the SES. The result can be explained in that the increase of bolus volume and consistency causes an increase in the SES opening duration and an increase in bolus passage duration through the SES. This finding can be used for the diagnosis of swallowing disorders due to a lack of SES opening such as Zenker’s diverticulum, neurological disorders including Parkinson’s disease, and post-radiation stricture.

The analysis of the duration of SC2 is more informative than that of the total duration because it is more discriminative. Indeed, SC2 varies with the consistency of the bolus while the total duration of the sound does not. This can be explained by the fact that the total duration of the sound also includes SC1 and SC3 which are independent of swallowed bolus. This fact may hide a significant difference caused by an increase in the SES opening. In the future, it seems more appropriate to consider only SC2 rather than the total duration of sound or the two other SCs.

We did not find any difference between yogurt and water whatever the volume, or between mashed potato and the other two textures for boluses of 5 and 3 ml. The absence of difference between yogurt and water may have been due to the mode of administration of the bolus through the syringe, as discussed above. Thus, it can be concluded that water and yogurt, as boluses of 5 and 3 ml, provide the same acoustic information. It is not necessary to multiply these measures. A meal of mashed potato and water for volumes of 10 and 5 ml seems sufficient to obtain pertinent acoustic results. In our future studies we will focus on these boluses that are the most informative.

The average lengths of SC1, SC2, SC3, IT1, and IT2 for 10 ml found in our study agreed with those of our previous study [13].

Conclusion

The total duration of the sound of swallowing, and, in particular, the second sound component (SC2), depends on the bolus. The differences are between mainly the thickest-consistency bolus and the two other consistencies, and between the largest bolus (10 ml) and the two other volumes. This result was obtained in a normal small population. It must be confirmed in a larger population in order to assess variations between subjects due, especially, to the degree of SES strength or the diameter of the pharynx.

SC2 is the most characteristic element in the sound of swallowing, being both the most prevalent component (present in 100 % of recordings) and the most sensitive to variations in the nature of the bolus. It can be used for the study of swallowing disorders due to SES dysfunction.

To extend this study, we are currently working on the design of a typical or reference meal to be used in an investigation whose goal is to exploit noninvasive, acoustic swallowing analyses to establish a method for detecting an early warning of swallowing disorders.