Introduction

Cervical auscultation has been adapted for the assessment of swallowing problems even before two decades [1]. There have been different studies done to clear out the vague ideas of cervical auscultation and swallowing sound. Using stethoscope was the initial way of detecting cervical auscultation of swallowing. However, the stethoscope and perceptual evaluation of swallowing sounds were considered highly subjective. Thus, the idea of acoustic analysis of cervical auscultation emerged to establish objectivity in the analysis [2,3,4,5,6]. The acoustic signals were recorded using microphones attached at the cervical level. The best position of the microphone placements was estimated as the level of cricoid [5]. The oscilloscopes were also taken into consideration for the recording. The comparison revealed that the microphone was more appropriate for swallowing sound recordings than oscillometer [2].

The literature shows that a recorded swallowing signal can be divided into three periods, called as Swallowing Signal Wave 1, 2, and 3 (SSW1, SSW2, and SSW3 respectively). These three components that represent three stages of swallowing are different in terms of their duration and intensity. The first wave, SSW1 represents the posterior tongue and hyoid movement, followed by the second wave, SSW2 that represents the elevation of the larynx, hyoid bone, and epiglottic closure. The last wave, SSW3, comprises the repositioning of the structures [7]. The SSW2 can also be considered as the opening sound of the upper oesophageal sphincter [8].

Different authors used various parameters for the evaluation of swallowing sounds. Though they named it differently, these parameters were related and same to an extent. The researchers used frequency, intensity, and duration parameters of the recorded swallowing sounds. The spectrogram was a common tool for the evaluations. Cichero and Murdoch (2002a) explored the acoustic features of swallowing sounds in 59 adults and reported that the peak Intensity was at 0.193 s of the entire signal with 550 Hz [2]. The signal had total duration of 0.4 s on average. The studies also revealed a significant effect of the bolus volume on the acoustic parameters of swallow [9].

The previous literature explains the normal aspects of cervical auscultation in different age groups and sex-specific age, sex, consistency, and volume of the bolus that has an effect on the cervical auscultation [6, 9,10,11]. Studies that investigated the swallowing acoustics using quantitative methods are negligible. Hence, there arises a need to document the acoustic characteristics of swallow sound in individuals with swallowing problem. Therefore, a preliminary attempt was made to know the acoustics of swallowing sound in individuals with swallowing difficulties and ascertain whether were different from the typical individuals. The specific aim was to explore the acoustic characteristics of cervical auscultation in individuals with dysphonia and dysphagia.

Methods

Participants

4 individuals diagnosed with oropharyngeal dysphagia, and 4 individuals with dysphonia who ranged in age 60–70 years (Mean age: 65.7; SD: 9.25) recruited from the institute for the study. The participants with dysphagia were selected after the confirmed diagnosis using clinical swallow examination and fiberoptic endoscopic evaluation of swallowing by an experienced ENT surgeon. The dysphonic individuals were recruited after the confirmed diagnosis of dysphonia using videostroboscopy by the ENT surgeon and perceptual, and instrumental evaluation of voice by SLP.

Informed Consent was obtained from all participants after explaining the aim of the study. The ethical clearance was obtained from the institutional ethical committee for bio-behavioral research in human subjects prior to initiation of the study (No.SH/EC/PhD/SLP-11/2023–2024, dated: 26-09-2023).

Materials and Instruments

The AKG C411/PP vibration pickup contact condenser microphone was used to record up the cervical auscultation. The microphone was connected to the Computerized Speech Lab (CSL) 4500 (Kay PENTAX of America, Inc.) to record the data.

Procedure

All participants were assessed individually in a laboratory environment. The microphone was placed at the cricoid level of the neck of each participant using a Velcro belt (Fig. 1a).

Fig. 1
figure 1

(a) Microphone placement on the neck using Velcro; (b) Normal Swallow Signal (Sound wave and spectrogram); Note DAS duration of the acoustic signal, DPI duration to peak intensity, PI peak intensity, FPI frequency at peak intensity

The water at room temperature was given in different quantities (5 ml, 10 ml, 15 ml, and 20 ml) to the participants. The participants were instructed to swallow each quantity of water in a single gulp. An interval of 1 min was given in between each swallow trials. The swallow sounds were recorded on the CSL 4500 main module with a sampling frequency of 44,000 Hz and visual display of the signal was checked for confirming the signal acquisition for the entire duration of swallow.

Acoustic analysis was carried out by PRAAT software version 6.3. The parameters and the norms given by Youmans & Stietwalts [11] was used and the same is depicted in the Table 1 [11]. A normal acoustic wave and spectrogram is depicted in Fig. 1.b. The following parameters were extracted from the recorded swallow sounds.

Parameters

Frequency at peak Intensity (FPI; Hz): The frequency at the point of the maximum intensity on spectrogram and intensity contour.

Peak intensity (PI; dB): The maximum intensity of the swallowing signal identified manually with the cursor on spectrogram and intensity contour.

Mean intensity (MI; dB): The average intensity of the entire swallowing signal digitally from the intensity contour.

Minimum intensity (MinI; dB): The minimum intensity in the entire of the swallow signal extracted digitally from the spectrogram and intensity contour.

Duration of acoustic signal (DAS; ms): The duration of the entire swallowing signal in milliseconds.

Duration to peak intensity (DPI; ms): The time from onset to the time when intensity reached peak value post swallow calculated manually using spectrogram and intensity contour.

Additionally, LTAS (Long term average spectrum) analysis was also done using Analysis of Dysphonia in Speech and Voice (ADSV) program of Computerised Speech Lab 4500 of KayPentax. The LTAS was used for the accurate estimation of FPI.

Table 1 Normative given by Youmans & Stietwalts [11]

Results

The acoustic analysis of the swallowing signal was done manually on each individual. The morphological characteristics of SSWs and the results of each individual are explained below. Each parameter’s mean and standard deviation were estimated using SPSS statistic software version 27 (Table 2). The DAS, PI, DPI, and FPI were compared with the existing normal values given by Youmans & Stietwalts [11]. A summary of the mean and standard deviations for DAS, PI, DPI, and FPI are depicted in Fig. 2.

Fig. 2
figure 2

(a) Sound wave and spectrogram of multiple swallows during 5 ml water swallow in participant 1; (b) The prolonged interval between SSW2 and SSW3 in participant 3; Sound wave spectrogram of swallowing of (c) 15 ml and (d) 20 ml water of participant 4; (e) The acoustic wave of cough after SSW2; Spectrogram of swallowing of 5 ml and 15 ml liquid showing equally amplified SSWs in (f) 5 ml water & (g) 15 ml water

Participant 1

The participant was a 78 years old male diagnosed with hoarse voice quality with voice tremor. There was presence of multiple swallow events in all quantities of liquids. An unusual first SSW with a high peak intensity as the second SSW in all 3 quantities was noted (Fig. 2a).

Participant 2

The participant was a 58-year-old male diagnosed with Muscle Tension Dysphonia (MTD-III). The swallowing sound was recorded in all quantities. The first SSW was observed as missing in 5 ml and 10 ml, while all three SSW were present in 15 and 20 ml. The participant did not complain any swallowing difficulties.

Participant 3

69-year-old female diagnosed with dysphonia. The interval between the 2nd and 3rd SSW was longer than usual. The swallowing sound was more strained and higher pitched perceptually when compared with normal sounds.

Participant 4

The 55-year-old female was diagnosed with oropharyngeal dysphagia with dysarthria. The 15 ml and 20 ml waves were different from usual. There were 3 waves in the SSW2 of 15 ml water swallow. The SSW1 was more evident and had a double peak at 15 ml. The 20 ml had 2 prominent waves in the entire response obtained.

Participant 5

The 77-year-old male was diagnosed with oropharyngeal dysphagia with dysarthria. All SSWs were present in all quantities except 15 ml.

Participant 6

A 49-year-old female diagnosed with oropharyngeal dysphagia. The swallowing sound was characterized by strain and the examiner noted the strained, high-pitched sounds perceptually. The aspiration event was observed with 10 ml liquid. So, the procedure was terminated at the level of 10 ml. The SSW3 was absent, and SSW was followed by aspirated cough.

Participant 7

A 77-year-old male diagnosed with oropharyngeal dysphagia. The recorded swallow sounds were perceptually higher in pitch and strained. The waves were unclear and could not distinguish three SSWs effectively (Fig. 3.f & g). The last wave was observed after a gap that had an intensity which was almost similar to the second wave. The procedure was terminated after 15 ml due to unsafe swallow.

Fig. 3
figure 3

Comparison graphs of DAS, DPI, PI and FPI with normal value; Note: Normal Mean given by Youmans & Stietwalts [11]; DAS duration of the acoustic signal; DPI duration to peak intensity; PI peak intensity; FPI frequency at peak intensity

Participant 8

61 years old male diagnosed with dysphagia and dysphonia. The person elicited a cough after having 5 ml of liquid. The rest of the trials elicited a safe swallow. The 5 ml recording showed a clear 2nd SSW followed by another highly intensive peak after 0.5 s. The first wave was absent in 5 ml, 10 ml, and 15 ml, while the third was absent in 10 ml.

Table 2 Mean and standard deviation

Discussion

The study was a preliminary attempt to identify the acoustic features of swallowing sound in individuals with disorders of swallowing and voice. The results indicated that the participants of current study exhibited altered acoustic characteristics when compared to the findings reported in literature. Complex waveform morphologies with longer durations and multiple waves were noticed. The presence of unexpected waves in the recordings could indicate multiple swallows rather than the wave components of a single swallow sound. The presence of multiple waves, even for smaller quantities of water, shows the inability to perform efficient swallow.

The majority of the participants showed DAS of nearly 1 s, while some of them needed more than 1 s to complete the swallow. The volume effect was also observed. The literature suggests that the duration of a normal swallow should be less than 1 s (2,3,8,11). The duration of components of swallowing signals also varied from the average values estimated by previous studies in individuals with dysphagia. The first and second intervals between the waves were 108ms and 236ms [8]. The persons with swallowing difficulties showed a higher value of second intervals in the present study when compared with the previous. The duration of the SSW2 increased with bolus volume in five of the eight participants which was in agreement with the findings of Hammoudi et al. [9]. The total duration was longer in all participants of the current study which was in contrast to the reported values of DAS by Hammoudi et al. [9].

When the intensity parameter was compared, it was found that the swallow sounds of participants in the current study had higher intensity than those reported earlier. The majority showed a PI greater than 70 dB, which is higher than the normal values of Youmans & Stierwalt [11]. The possible reason for this can be the effort of individuals with swallowing difficulties at the pharyngeal level. The gain of the recording samples could also be an influencing factor. The FPI correlated with the findings of Youmans & Stierwalt [11] indicating possible absence of swallowing difficulties [11]. The FPI values estimated using LTAS were different from the values obtained using the spectrogram. The LTAS values were lower than those obtained using spectrogram and intensity contour. The perceptual quality revealed the sounds to be tensed and high-pitch indicating a possible and probable presence of stressful swallow events in people with dysphagia.

The results in general varied in the participants of the current study signifying distinct variabilities in the pharyngeal functions in individuals with swallowing disorders and voice problems. Even though the parameters were compared with the norms from the literature, there is a need for comparison studies in individuals with and without swallowing issues using the similar method. Even though the parameters were well-defined for the differentiation of normal and abnormal swallows, there is still a need to generalize and estimate the cut-off scores for the clinical use.

Conclusion

This pilot exploration measured the acoustics of cervical auscultation of only eight individuals with either voice or swallowing disorders. As the participants aged of 60 and greater were included, the effect of aging and age-related factors could have influenced the results and the same were not considered. Diverse waveform morphologies, mixed acoustic findings as against those reported in literature necessitate the need to continue focused research to ascertain parameters that are of diagnostic importance for early identification and differential diagnosis of swallowing disorders due to varied causative factors. The acoustics of the cervical is a potential tool for screening and may even become a method for diagnosing swallowing problems in individuals with swallowing difficulties. Suitably exploring cervical auscultation is essential to record the nature of swallowing sounds in individuals with swallowing problems. Replication using a large group of participants is essential for the generalization of results, establish norms and application of the same in routine clinical evaluation.