Keywords

1 Introduction

The swallowing mechanism is a sequential process with differentiated oral, pharyngeal, and esophageal phases that carry saliva, solids and fluids from the mouth to the stomach. Alterations in this process is known as dysphagia. Normal swallowing involves the coordination of 30 pairs of muscles [5]. The infrahyoid muscles and also another laryngeal and pharyngeal muscles, play an important role in the pharyngeal phase behavior [8]. In special, the infrahyoid muscles descend the hyolaryngeal complex toward the sternum [16]. Such displacement moves the larynx under the base of the tongue and closes the laryngeal vestibule before opening the upper esophageal sphincter [5]. Studies of surface electromyographic signal (sEMG) from infrahyoid muscles can help to determine alterations in swallowing, i.e. the presence of dysphagia [12, 19, 24]. However, these muscles have been relatively few investigated due to difficulty to assess them non-invasively [8].

Problems with sEMG acquisition from swallowing related muscles rely on the fact that they have small size and overlying fibers [5], producing low signal-to-noise ratio (SNR) and difficulties to detect the muscle activations (bursts). In the time elapsed between the end of a muscle burst and the beginning of the successive one, the muscle under study is silent. However, the electrode detects a background noise [13]. In order to evaluate the quality of the recorded signals, the background noise and SNR are usually estimated by manual or automatic segmentation of the signal in the time domain [1]. Signal processing and analysis methods can improve the SNR in the sEMG measurements from infrahyoid muscles [17]. Visual muscle onset determination by an expert examiner is considered to be the gold standard for detect the bursts [25]. However, this time-consuming method has moderate reproducibility and repeatability [4].

Automatic strategies for onset detection include the simple [7] and double-threshold [3], Teager-Kaiser energy operator (TKEO) [21]. Most of these methods are based on amplitude measures as well as they are oriented to analysis of the large muscles of limbs. It is well known that in sEMG signals, the amplitude depends on several uncontrolled variables such as: electrode-skin impedance, volume conductor, inter electrode distances, location of electrodes, etc. [6]. Furthermore, the heuristically chosen threshold has been shown to introduce errors into onset determination [7, 23], specially in signals with low SNR. Some studies set the threshold by optimization, but this process can generate overfitting because the high variability of amplitude and SNR in sEMG recordings between trials and subjects.

In order to avoid the above mentioned problems, another strategies such as the maximum likelihood ratio [26] and gaussian-mixture models [10, 15], have been implemented. These methods have the assumption that sEMG is generated by a Gaussian process - which is not exact - and need previous information about the distribution involved in the generation of sEMG signal.

In this study we propose a time-frequency method based on Continuous Wavelet transform (CWT) in order to detect bursts. This method is based on the scales-coefficients energies but not in amplitude measures. Our process includes a denoising step base on discrete wavelet transform (DWT) to improve the SNR. We tested the proposed method in a database with sEMG recorded in the right infrahyoid muscle during swallowing tasks. In order to make a comparison, we tested the signal with a previously reported threshold-based method which used amplitude based measures such as TKEO and RMS [17], instead measures of energy of the spectrogram. This is one of the first approaches aiming to characterize swallowing related signals in a automatic way, as a preliminary step to analyze muscle behavior in patients with swallowing disorders.

2 Materials and Methods

2.1 Subjects

We recruited 38 volunteers (16 males and 22 females), healthy subjects aged between 32 and 50 years old (41.16 ± 6.12). The following exclusion criteria were used for subject selection: dental braces, congenital oral malformations, active inflammatory processes (mouth, head or neck), strange elements in mouth (like piercing), diagnosed cognitive disorders (motor or sensorial), chronic obstructive pulmonary disease, head or neck cancer antecedents, or facial aesthetic surgery. All the males were well-shaved. A balanced sex ratio was not considered necessary. Informed consent was taken from each case and the study was approved from Ethics Committee of the Instituto Tecnológico Metropolitano.

2.2 Signal Acquisition

We analyzed right infrahyoid muscles during swallowing tasks. This muscle plays an important role in the oral and pharyngeal phases of the swallowing process. Myoelectrical activity was measured with a differential bioamplifier connected to the polygraph PowerLab 16/35 (AD Instruments Inc.). The sEMG signal was acquired with non-polarizable, bipolar, disposable and pre-gelled Ag/AgCl electrodes (Ref. 2228, 3M - 30 mm \(\times \) 35 mm, 15 mm diameter in gel area and interelectrode distance of 25 mm). The reference electrode was placed in the forehead. Figure 1 shows the placement of the sEMG electrodes.

Fig. 1.
figure 1

Electrode placement for right infrahyoid sEMG acquisition.

The following boluses were taken by the subjects: thin liquid (water - 5, 10 and 20 mL), yogurt (3, 5 and 7 mL), and one saliva swallow; 266 sEMG recordings were used for analysis. The three consistencies have been used for assessment of penetration/aspiration in dysphagia [18]. Liquid and yogurt were delivered to the oral cavity via a 1.5 oz cup. Every task was video-recorded and initiated when the examiner require to swallow. Video frames were synchronized with the signal acquisition.

2.3 Signal Pre-processing

The sEMG signals were acquired with sampling frequency \(F_s=2\) kHz. The pre-processing, storage and visualization of the raw signals were carried out with LabChart Pro (AD Instruments Inc.). Offline analysis was performed using a custom program (Matlab, MathWorks Inc.). As the acquired signals have low signal-to-noise ratio (SNR) and are highly susceptible to crosstalk, another filter with narrow bandwidth was employed to improve the burst detection. The signals were filtered with a 5th order bandpass Butterworth filter between 90 and 250 Hz [17].

Each signal was visually inspected in the time domain in order to detect onset and offset of bursts. These times were saved for subsequent analysis and comparison with the automatic method proposed here.

2.4 Time-Frequency Method for Burst Detection

Every signal was denoised using a Discrete Wavelet Transform (DWT)-based algorithm, with soft minimax thresholding, symlet-8 as mother wavelet and 8 decomposition levels. To evaluate the time-frequency behavior of denoised sEMG, We applied subsequently a continuous wavelet transform (CWT) given by [20]:

$$\begin{aligned} CWT(s,\tau )=\frac{1}{\root \of {s}}\int x(t)\psi ^*\left( \frac{t-\tau }{s}\right) dt \end{aligned}$$
(1)

where \(\psi (t)\) denotes the mother wavelet, the asterisk is the complex conjugate, \(\tau \) is the translation in time and s is a frequency-related scale. Although initial signals have only one dimension, the CWT gives a bi-dimensional representation of scale vs. time (scalogram).

The selection of the mother wavelet must be done carefully based on the nature of the signal. We selected the Haar wavelet for analyzing sEMG signals since we are looking for amplitude changes in the signal and it has low complexity.

The original scalogram was rescaled in the range 1–255 (\(CWT_{rs}(s_{rs},\tau )\)). We sought for times \(\tau \) with scales-coefficients energies above a background level \(E_{min}\). With the information in (\(CWT_{rs}(s_{rs},\tau )\)), we built a projection vector \(Proj(\tau )\) which was used for burst detection. This procedure is summarized in Algorithm 1. We defined a threshold for \(Proj(\tau )\), denoted as \(Th_w\), that define the points at which onset and offset points are detected. This procedure is detailed in the Algorithm 2. Both \(E_{min}\) and \(Th_w\) were optimized to get an accurate detection of bursts.

figure a

If the difference between the offset of a burst and onset of the subsequent one was lesser than 250 ms, we unified both burst as only one. This was made to avoid the presence of consecutive bursts with non-physiological duration, and a false increasing of true positives.

figure b

2.5 TKEO+RMS for Burst Detection

We compared the above mentioned method to a TKEO-based method for burst detection. This method has been reported for burst detection in signals with low SNR [21, 27]. The TKEO of a discrete signal x(t) is computed point-by-point follows:

$$\begin{aligned} TKEO\left\{ x \left( t_{i} \right) \right\} = \left( x \left( t_{i} \right) \right) ^{2}-x\left( t_{i-1} \right) x\left( t_{i+1} \right) \end{aligned}$$
(2)

We computed the RMS over the TKEO signal according to [17]. The resulting signal is denoted as \(RMS_{TK}(t)\). We applied a fixed-size sliding window with length of 250 ms and steps of 100 ms. \(RMS_{TK}(t)\) is decimated in time due to the windowing process. The burst detection was made using a threshold defined by the following expression:

$$\begin{aligned} Th_{TK}=\mu + h\sigma \end{aligned}$$
(3)

where \(\mu \) and \(\sigma \) are the mean and standard deviation of \(RMS_{TK}(t)\), respectively, and h is a preset variable that defines the level of the threshold. The parameter h was fixed at 18 according to [17]. We applied a process similar to Algorithm 1 for burst detection through thresholding, but in this case \(Proj(\tau )\) is changed by \(RMS_{TK}(t)\).

2.6 Validation

A burst is limited by the time elapsed between the onset and the offset points. One trained person marked these points in order to avoid inter-expert variability and uncertainty. Every mark is done if and only if two conditions is accomplished: the laryngeal ascent is clearly identified in the video-recording and changes in amplitude and frequency are detected in the time-domain signal. The visual mark was used as reference for assessment of the proposed method.

We computed the true positives (TP), false positives (FP) and false negatives (FN) of burst detection process in the whole dataset. Both, the proposed CWT-detection based algorithm and TKEO-based algorithm were evaluated in comparison with the visual marks. Using TP, FP and FN, we computed the precision (Pr), recall (R) and \(F_1\) score of the burst detection methods as follows:

$$\begin{aligned} \begin{aligned} Pr = \frac{TP}{TP+FP}&R = \frac{TP}{TP+FN}&F_1 = \frac{2\times R\times Pr}{R+Pr}.\\ \end{aligned} \end{aligned}$$
(4)

True negatives were not computed because they correspond to the identification of the background segments, which are not part of the interest of the current work. For the CWT-detection based algorithm, we compared different combination of parameters \(E_{min}\) and \(Th_w\), and those that achieved the maximum value of the \(F_1\) score were chosen as the optimal parameters. For the method based on TKEO, we varied \(Th_{TK}\) in order to maximize the \(F_1\) score.

3 Results and Discussion

We propose a time-frequency method based on the CWT aiming to detect bursts in sEMG signals from infrahyoid muscles. The proposed method uses two thresholds: \(E_{min}\) and \(Th_w\), which are applied to CWT scales but not to sEMG amplitude. We performed an exhaustive search to find the optimal parameters \(E_{min}\) and \(Th_w\). \(E_{min}\) varied between 1 and 20 whereas \(Th_w\) varied between 1 and 30. Optimal parameters were found to be \(E_{min}=3\) and \(Th_w=18\). Left side of Fig. 2 shows the values generated by the \(F_1\) score for variations of \(E_{min}\) when \(Th_w\) is fixed at the optimal value. In the same way, the right side of of Fig. 2 shows the values generated by the \(F_1\) score for variations of \(Th_w\) when \(E_{min}\) is fixed at the optimal value. \(E_{min}=3\) means that, if energy for each scale is lesser or equal than 3 (in a scale between 1 and 255), that point of the scalogram is marked as noise. \(Th_w=18\) means that at less 18 scales - in a scalogram with 64 scales - contain energy components greater than \(E_{min}\). It is clear that the energy threshold \(E_{min}\) is very small, which means that in our method, it is more important the scale-related threshold than the energy one. This finding avoids problems related to amplitude variability in the signals.

Fig. 2.
figure 2

Effect on \(F_1\) score when the following parameters vary: (left) \(Th_w\) (\(E_{min}\) fixed at 3); and (right) \(E_ {min}\) (\(Th_{w}\) fixed at 18)

Table 1 shows a comparison of the number of burst detected by both methods, the proposed CWT-based method and the TKEO-based one. The optimal parameters of \(Th_w\) and \(E_ {min}\) used in the CWT-based method achieve the highest values of \(F_1\) score (88,64%) in comparison to the TKEO-based method (83,46%). The proposed CWT-based method exhibits a balance between the precision and recall better than TKEO.

Table 1. Counting of true positives (TP), false positives (FP) and false negatives (FN) for the proposed and the TKEO-based methods in the whole dataset. Performance measures achieved with the both methods are shown.

Problems with traditional threshold-based methods for detection of muscular activation are widely reported. As the threshold is user selected, it could be specially hard to find the optimal value if the signal has low SNR. This means affectation of precision and recall. If the threshold is too low, the number of false positives increases (low precision), whilst if the threshold is too high, the number of false negatives increases (low recall) [26]. This is specially true in the signals recorded in our work. Two factors come together: the SNR is low (below to 10 dB) and the infrahyoid muscles are highly susceptible to cross-talk. Additionally, amplitude-based methods need previous knowledge of the signal baseline to estimate the threshold [9]. To overcome this limitation, it is necessary to ask the subject not to execute any movement during the first second. By contrast, the proposed time-frequency method does not need an initial background segment.

Merlo et al. proposed a method for onset and offset detection based on CWT [14]. However, their method depends on a scales-energy threshold and does not consider additional information from the CWT scalogram. Our method uses time-frequency analysis and it measures CWT coefficient energies and uses more detailed information from the scalogram. In this way, \(Th_w\) is a measure of the energy distribution across the frequency scales for each sample. Based on our observations, noise segments contain less energy across different frequency components and this behavior does not have a strong dependence on the amplitude. Consequently, our method is able to detect onset and offset in signals with low amplitude and low SNR, for instance sEMG signals from infrahyoid muscle.

Figure 3A shows an example of pre-processed sEMG signal with low SNR. Figure 3B shows the same signal after denoising process using DWT (SNR increases). Although three segments with activity are present in this signal, the scalogram (Fig. 3C) shows that only one activity segment has the highest energy components for most scales. After the projection process described in Algorithm 1 only one burst is detected. Figure 3D illustrates a comparison between the reference signal and the burst detection performed by our method. Visual inspection, in comparison to the video recorded during the protocol, confirms that there is only one burst associated with one swallowing.

Fig. 3.
figure 3

Example of burst detection by the proposed method: A. Signal filtered with 90–250 Hz bandpass filter, B. Signal after denoising with DWT, C. Scalogram applied to denoised signal, D. Burst detection estimated using the proposed method (blue line) and marks with visual inspection (red dashed line). (Color figure online)

On the other hand, Fig. 4 shows an example of the burst detection process using TKEO with the same signal used in the Fig. 3 (see Fig. 4B). One threshold applied to RMS signal computed over TKEO time series (Fig. 4C). Figure 4D shows that three bursts are detected using this method, two of them considered as false positives. One of the main problem of this method is its dependence of the selected threshold. This behavior was observed in the whole database.

Fig. 4.
figure 4

Example of burst detection by the TKEO based method: A. Filtered sEMG signal with 90–250 Hz bandpass filter, B. TKEO applied to filtered signal, C. \(RMS_{TK}(t)\) and thresholds for onset and offset detection, D. Burst detection estimated using the TKEO method (blue line) and marks with visual inspection (red dashed line). (Color figure online)

Several published methods measure the error in time for onset detection [10, 11, 22, 26, 27]. We computed the performance measures for the presence/absence of burst even though our method is able to detect onset and offset times. We have a limitation in our database, because low SNR makes difficult to mark accurately onset and offset times by visual inspection. In future works, simulated sEMG signals and real signals will be used to optimize the method to reduce the error in time. Real sEMG signals will be recorded synchronized with other signals as a reference for mark onset and offset times.

In future works our method will be used for swallowing characterization. For instance, duration of the bursts could be measured. This factor is crucial for the swallowing analysis, because the time required to swallow different kind of boluses varies between grades of compromise in dysphagia [2]. Also, the method can be applied to sEMG signal from different muscles involved in swallowing. In that case, onset and offset times will be used to establish the sequence of activation in healthy and pathological subjects during swallowing.

4 Conclusions

In this paper, we proposed a novel method for onset and offset detection on sEMG signals based on time-frequency analysis. The proposed method used the CWT scalogram to detect the time where there are changes of scales-coefficients energies. We used a scale-related threshold to ensure that those changes are present in a broad range of frequency-scales. Our method avoids two major problems in sEMG burst detection: performance highly dependent on amplitude measures, and the requirement of an initial segment without muscle activity. We tested the method in sEMG signals recorded in the infrahyoid muscle group during swallowing tasks. Results evidence that the proposed method is feasible for burst detection in signals with low SNR - \(F_{1} score = 88,64\%\)-. This method is a promissory tool for sEMG segmentation in swallowing analysis.