Introduction

In the pharyngeal phase of swallowing, as the larynx moves up and forward, a quick succession of events begins with contraction of suprahyoid muscles (mylohyoid, geniohyoid, stylohyoid, anterior digastric, and thyrohyoid muscles). Structurally, each of these muscles is attached to the hyoid bone. When suprahyoid muscles contract, the hyoid bone moves up and forward. This superior-anterior hyoid movement triggers a chain of biomechanical events (e.g., thyrohyoid shortening) that enable the elevation of the hyolaryngeal complex and help open the upper esophageal sphincter [1].

The hyoid bone is constantly displaced while the person is swallowing, with movements that vary even more than that of the mandible and tongue. Its mechanical connections with the cranial base, mandible, sternum, and thyroid cartilage through the supra- and infrahyoid muscles enable it to play an important role in controlling the movements of the mandible, tongue, and hyolaryngeal complex [2, 3]. This displacement brings the larynx under the base of the tongue, while the epiglottis is retroverted to seal the laryngeal vestibule [1, 4, 5].

People with deficits in these muscles may be impaired by dysphagia, due to biomechanical difficulties in the transition from the oral to the pharyngeal phase of swallowing, weakening the elevation mechanism of the hyolaryngeal complex, and consequently the protection of the airway. This manifestation makes the trajectory of the bolus from the mouth to the stomach unsafe, possibly causing food/saliva/liquids to enter the airway. This in turn may result in cough, suffocation/asphyxiation, and aspiration, which may cause nutritional deficits, dehydration, weight loss, pulmonary problems, pneumonia, and death [6].

Kim and McCullough [7] did not find differences between the sexes when assessing the maximum displacement of the hyoid bone in non-dysphagic older adults with ultrasound (US). They reached an approximate displacement of 2.62 cm in swallowing 5 ml of liquid. Chi-Fishman and Sonies [8] obtained a smaller result (2.0 cm) when assessing hyoid displacement in non-dysphagic older adults with 10 ml (liquid) and 20 ml (nectar-thick liquid). Hsiao et al. [9] observed that displacements inferior to 1.5 cm are cutoff scores to detect dysphagia in patients who depend on a feeding tube (FOIS 1–3). Their findings indicated a sensitivity and specificity of 73.3% and 66.7%, respectively. This decreased elevation is associated with aspiration [7, 10,11,12,13].

US has been a great help in assessing quantitative parameters of the oropharyngeal phase of swallowing [13]. It is a noninvasive technique that furnishes dynamic real-time images focused on soft tissues and structures of the body [14]. US has some advantages over traditional dysphagia diagnostic methods: it does not use either contrast or exposure to ionizing radiation, the equipment can be portable, and it has a low cost [15, 16].

The analysis of hyoid bone displacement amplitude with US helps better understand the elevation mechanism of the hyolaryngeal complex, to which the hyoid belongs. Hence, the reliability of this US measure must be assessed in order to use this parameter to reach a more precise dysphagia diagnosis, establish normal values, and plan more specific and directed therapies [17].

US assessment of hyolaryngeal structure arrangement in the swallowing process is rather varied. Therefore, studies approach differently the qualitative and quantitative assessment of hyoid bone displacement amplitude in swallowing. This measure is the most analyzed because it is key to understand the pharyngeal phase of swallowing, and it is potentially a reference measure of therapeutic gain [10, 17, 18]. Hence, this measure and its reliability must be standardized. Based on this assumption, this systematic review aimed to analyze the reliability of measuring hyoid bone displacement amplitude in swallowing with US.

Methods

Protocol and Registry

This systematic review followed recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [19], and the review protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO) under number CRD42020164655.

Developing the Research Question and Eligibility Criteria

The research question was developed using three components, which met the study profile, structured the investigation, and guided the search strategy. Each letter represents a relevant aspect to better define the question: P (standing for patient) Healthy adults and older adults and/or with swallowing disorders; C (standing for comparison) intraclass correlation coefficient; O (standing for outcome) US measure of hyoid bone displacement in swallowing [20]. Thus, this review was conducted based on the following question: “Is US measure of hyoid bone displacement amplitude in swallowing in dysphagic and non-dysphagic adults reliable?”.

The review included studies published up to July 2020 that analyzed the reliability of US measures of hyoid bone displacement amplitude in swallowing in adults and/or older adults, either with swallowing disorders or not. On the other hand, abstracts and annals of congress, literary reviews, studies not available in full text, those that used US only as a method to assess the esophageal phase of swallowing, that included children in the sample, and that used US to analyze tongue movement in speech, hyoid-laryngeal approximation, and newborn’s sucking were excluded from the review. Therapeutic use of US was not included, and there was no restriction of language.

Research Strategy

The terms were chosen based on the conceptual block macrostructure, in which each block represents a field to be investigated in relation to another one. The research terms were validated with the Medical Subject Headings (MeSH) for MEDLINE, Scopus, Web of Science, and Cochrane Library, and with Emtree terms for Embase. After being validated, the descriptors were used as the basis of the search, determining the synonyms and relationships we would make. The Boolean operator “AND” was used in the search strategy, which was directed with Ultrasonography AND Pharynx AND Deglutition. Based on this, an advanced manual search was developed for each database along with conceptual equivalents (Table 1).

Table 1 Search strategy customized per database

The search was made on July 14, 2020, including the whole retrospective period indexed by the databases. Aiming to identify relevant studies that were not found in the electronic search, a manual search was made, analyzing references of the articles selected to be read in full. The results were last updated on June 24, 2021. No author was contacted to identify studies, get additional information, or add studies to the results and meta-analyses after the new update.

Study Selection

The article selection process was conducted in a blind, paired, and independent way and was divided into three stages, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart [19], including the stages of study identification, screening, and inclusion (Fig. 1).

Fig. 1
figure 1

Flowchart of the search and selection phases of the systematic review of ultrasound intraclass correlation coefficient. Maximum hyoid bone displacement in swallowing

In the initial stage, two independent reviewers identified all references retrieved in the search, with an adapted search strategy for each database. The Mendeley software (https://www.mendeley.com/) was used to manage, store, analyze, and remove duplicate references.

The screening consisted of reading the titles and abstracts to dismiss studies that did not meet the preestablished selection criteria and keep the possible eligible studies. Two reviewers conducted this stage, and any disagreement between them was solved by consensus. Afterward, the full text of each manuscript was read to verify whether the selected references answered the research question and met the eligibility criteria. Hence, the articles had to be focused on assessing the reliability of US measure of hyoid bone displacement amplitude. Two paired, independent, and blind reviewers evaluated the full texts. Disagreements on the judgment were solved by discussing it with a third reviewer, who is experienced in US assessment and systematic reviews.

All studies that passed the previous stages were included for analysis of the following data of interest: study identification, methodological design, study population characteristics, and protocols for US acquisition and analysis. The kappa interrater agreement coefficient in the article selection phase was strong (0.75).

Data Collection

Two raters independently extracted the qualitative data and outcome of each article included. They had been trained for calibration to ensure consistency and refinement of the data extraction spreadsheet. The extraction followed the Cochrane Handbook for Systematic Reviews of Interventions [21]. The following information of all studies that met the eligibility criteria was tabulated in a Microsoft Excel spreadsheet (Microsoft Corporation, WA, EUA): first author, title, and year of publication; type of study; descriptive data (sample number [both total and per sex], age group, country where the study was conducted, sample selection criteria, objective, volume, and consistency used to assess swallowing, US equipment used, transducer position to take images, time to take images/frames, participant’s position during swallowing assessment, number of examiners, condition in which the images were assessed [recorded or real-time images], time interval in between image assessments, description of the method used to assess hyoid bone displacement), and reliability data of hyoid bone displacement (intra- and interrater ICC values).

Evaluation of the Risk of Bias

Two independent raters evaluated the quality of the studies using the Quality Appraisal of Reliability Studies (QAREL) [22], of which 11 items cover the following seven domains: the spectrum of subjects, the spectrum of examiners, examiner blinding, the order effects of examination, the suitability of the time interval between repeated measurements, appropriate test application and interpretation, and appropriate statistical analysis.

Each item can be answered with “yes,” “no,” “unclear,” or “not applicable” – “yes” suggests a good-quality study resource, whereas “no” indicates a poor-quality resource [22]. If the two raters disagreed regarding these answers, they discussed their reasons, and the final decision was reached by consensus, following recommendations of the Cochrane Handbook for Systematic Reviews of Interventions [21]. If no consensus was reached, a third author was asked to judge and solve the issues.

Data Analysis

The qualitative data were organized and described in tables, and the meta-analysis was made with the Comprehensive Meta-Analysis software. The intra- and interrater reliability estimates were analyzed separately with meta-analysis. The reliability coefficients were analyzed with Fisher’s r-to-z transformed correlation coefficient. The random-effect model was considered because of the heterogeneity of the studies. The significance level was set at p < 0.05. The statistical heterogeneity was assessed with the Cochran Q test and I2 test. Funnel plots were developed, and the Egger test was used, to find whether publication bias can threaten the validity of the meta-analysis results.

Results

A total of 1,559 articles were screened with the search on databases, 78 of which were excluded for being duplicates. Thus, the title and abstract of 1,481 articles were screened, and 63 articles were read in full, with a 98.38% interrater agreement. The manual search of references identified 27 potential studies, although they were all excluded 15 for being duplicates and the others for having a type, outcome, or population different from the eligibility criteria. Hence, only three articles analyzed the ICC for maximum hyoid bone displacement and comprised the quantitative and qualitative analysis of this systematic review (Fig. 1).

This systematic review unlike others already published that assessed the diagnostic precision of US in detecting aspiration and pharyngeal residue in patients with dysphagia [23] aimed to assess the reliability of US measures of hyoid bone displacement amplitude. There was an interrater reliability of 0.858 (95% CI: 0.744 – 0.924) and intrarater reliability of 0.968 (95% CI: 0.903–0.990).

The studies by Chen et al. [11], Macrae et al. [24], and Hsiao et al. [9] had observational methodological designs, two of them developed in Taiwan and one, in New Zealand, published between 2012 and 2017 (Table 2). Altogether, 45 US images of hyoid bone maximum displacement in swallowing were used to assess the reliability of US with intraclass correlation coefficient. Macrae et al. [24] assessed five non-dysphagic (healthy) people aged 20 to 50 years (two men and three women); Hsiao et al. [9] assessed 10 healthy people but did not report their age or sex. Chen et al. [11] assessed 10 men with dysphagia – caused by stroke, neuromuscular disease, traumatic brain injury, chronic obstructive pulmonary disease, spinal cord injury, aspiration pneumonia, and gastroesophageal reflux disease aged 54 to 81 years; regarding the functional oral intake scale (FOIS), one patient was classified as FOIS 1, four as FOIS 2, one as FOIS 3, one as FOIS 4, two as FOIS 5, and one as FOIS 6 hence, five of them used a feeding tube, while five had full oral feeding.

Table 2 Individual characteristics of the studies included in the ultrasound agreement analysis of maximum hyoid bone displacement in swallowing

Chen et al. [11] used an US machine they had developed with a curvilinear transducer (Convex Array, 3.5 MHz, P701- C04; LELTEK Corporation, Taipei City, Taiwan) connected to a laptop computer and placed on a cart, enabling it to be used by the bedside (US machine LT701, LT701-000; LELTEK Corporation) [11]. Hsiao et al. [9] also used their own US machine model with curvilinear transducer (BS3C673 Convex Array, 3.5 MHz, BSUS20-32C; Broadsound Corporation, Taiwan), connected to a laptop computer and placed on a cart, likewise enabling it to be used by the bedside. Only one study [24] used a scanner IU22 (Philips Ultrasound, Bothell, WA) with a 5–1 MHz curved transducer. The images were recorded at the rate of 22.5–30 frames per second.

Before acquiring the data, Macrae et al. [24] defined the reference point of hyoid bone displacement at the crossing of the shadow projected by the genial tubercles and the echogenic surface of the mandibular bone. The maximum hyoid bone displacement was defined based on the distance between the shadows created by the hyoid bone intersected with the geniohyoid muscle. The mandible was also used as a reference point by Chen et al. [11] and Hsiao et al. [9]. They defined the reference point as the anterior inferior border of the acoustic shadow of the mandible. Using a two-axis coordinate, the hyoid bone position in relation to the mandible was represented in each frame as paired coordinates. The distance between two coordinates before and during swallowing determined the hyoid bone displacement.

The Quality Appraisal Tool for Studies of Diagnostic Reliability (QAREL) [22] (Table 3), applied to assess the quality of the studies [9, 11, 24], revealed that US was used in a sample representative of the population to whom the authors had meant to apply the results. In this regard, the study objective is to analyze the reliability of the measure, regardless of sample characteristics. It was not clear, in any of the studies, whether the raters who conducted them represented the public to whom the results would be applied. Only the study by Macrae et al. [24] reported that the raters were blind to findings of other raters and their own during the study. The test was correctly applied and adequately interpreted (according to criteria preestablished by the authors) in all studies, and the ICC was used as an adequate statistical measure of agreement. Hsiao et al.’ study [9] was the only study that did not make clear the adequacy of the time interval between measurements; this interval was described by the other authors. None of the studies described the order in which the images were analyzed for intra- and interrater reliability.

Table 3 Quality assessment of the studies included in the ultrasound agreement analysis of maximum hyoid bone displacement in swallowing

The study by Macrae et al. [24] presented a maximum displacement of 3.1–3.9 cm in the 25 saliva swallowing assessments in a population without dysphagia. Chen et al. [11] found a mean of 1.6 and 1.5 cm for each rater in the swallowing of 5 ml of water in a population with dysphagia. Hsiao et al. [9] found a mean maximum hyoid bone displacement of 1.7 cm in the swallowing of 5 ml in a population without dysphagia. The intraclass correlation coefficients of the studies revealed excellent reproducibility ≥ 0.8 (Table 4).

Table 4 Summarized statistics of the studies included in the ultrasound agreement analysis of maximum hyoid bone displacement in swallowing

Macrae et al. [24] report in their study that they recorded 8 s videos for each saliva swallow (five swallows were recorded for each of the five participants) with 30 s intervals in between swallows. They used minimal transducer pressure under the floor of the mouth surface and gel for better acoustic coupling. The sonograms were acquired and later processed off-line (the reliability of the study by Macrae et al. [24], as well as that of the other studies, reflects data measurement, and not US acquisition). The participants were instructed to keep both their head and tongue relaxed (while they were not swallowing) and not flex their necks.

Depth and gain configurations were made to accommodate each participant’s anatomy and enable visualization of US shadows. The lead researcher acquired all US images and, along with two independent and blind coresearchers, concluded the data analysis to verify interrater reliability. For intrarater reliability, the lead researcher, blind to initial measures, assessed the data a second time in a single session, on the same day.

Hsiao et al. [9] analyzed single-frame US images of 10 non-dysphagic people, recorded at a rate of 22.5 frames per second. In their study, one examiner made the recordings, while another one repeated the examination for reliability analysis, both having been trained for 1 month. The transducer was slightly in contact with the submental region and was manually positioned, following the previously described technique [25]. The participants were instructed to keep their heads steady as they swallowed 5 ml of water in each of the three attempts, establishing mean values for reliability analysis.

Chen et al. [11] recorded each US examination as a series of dynamic images, recorded at a rate of 30 frames per second and stored on a laptop. They attached to the transducer a water-based coating they had developed to increase contact between the skin and the transducer. The 10 dysphagic participants were instructed to keep their heads steady as they swallowed 5 ml of water three times. The best of the three recordings was later analyzed.

The maximum hyoid bone displacement measure was analyzed by two authors. For intrarater reliability assessment, each of the two authors took measures twice from each patient, with an interval of > 48 h. For interrater reliability, the first measures taken by the two authors were compared with those from another author.

In the forest plot (Fig. 2), with an interrater Interclass Correlation Coefficient (ICC) of 0.858 (95% CI: 0.744–0.924), the similarity between individual study results reflects an absence of heterogeneity. Although the ICC values were good and excellent, attention must be paid to the significant heterogeneity (p = 0.005) of the intrarater ICC, given the subjectivity of the rater working with parameters that had been previously established for the assessment. The study by Hsiao et al. [9] obtained an intrarater ICC of 0.842 for one of its raters, which may have influenced sample heterogeneity. The time (30 days) taken to train the raters and their experience in assessing maximum hyoid bone displacement when swallowing 5 ml of water may have influenced image analysis, as well as the number of frames per second, the uncertain analysis time intervals between intrarater reliability assessment measures, and a possible graphic imprecision due to the subject’s instability since the examination depends totally on the participant’s cooperation.

Fig. 2
figure 2

Forest plot of interrater reliability

The meta-analysis presented, for interrater reliability, an ICC = 0.858 (95% CI: 0.744 0.924) (Fig. 2) and null heterogeneity (Fig. 3). The effect size was significant (p < 0.001). The interrater reliability funnel plot is shown in Fig. 4. The Egger test had a value of -0.058 (p = 0.954), which dismissed the risk of publication bias.

Fig. 3
figure 3

Heterogeneity analysis of interrater reliability

Fig. 4
figure 4

Funnel plot of interrater reliability

For intrarater reliability, there was an ICC = 0.968 (95% CI: 0.903 – 0.990) (Fig. 5). The effect size was significant (p < 0.001). However, the Cochran Q test indicated significant heterogeneity (p = 0.005), with I2 statistics showing that 73.25% of the effect estimate variability is due to heterogeneity (Fig. 6). The intrarater reliability funnel plot is shown in Fig. 7. The Egger test had a value of -0.380 (p = 0.704), which dismissed the risk of publication bias.

Fig. 5
figure 5

Forest plot of intrarater reliability

Fig. 6
figure 6

Heterogeneity analysis of intrarater reliability

Fig. 7
figure 7

Funnel plot of intrarater reliability

Discussion

Rocha et al. and Costa et al. [16, 26] restate that US swallowing assessments require specific training with a specialized professional and knowledge of the anatomical structures assessed, imaging procedures, and system operation. Besides not being clear how many frames per second they used in the methodology, the study reports that they did not use a head stabilizer. They recognize it as a limiting factor since US examinations require the participants’ cooperation to ensure a stable transducer positioning and the quality of the images it picks up. Chi-Fishman and Sonies [8, 27] show the importance of the head stabilizer to increase contact between the transducer and the skin and ensure better images. The stabilizer makes it possible to fix the transducer at the same point of contact when the patient’s positioning cannot be ensured in different sessions.

This review observed that the studies only described the raters’ time of experience, their training to handle, acquire, and analyze US data, and the setting where they took measures. Despite this limitation, the ICCs shown by the studies [9, 11, 24] with the transducer positioned in the submandibular region had good reproducibility (Table 3) in both dysphagic and non-dysphagic patients. However, the authors did not describe whether the images were randomized during US analyses; in case, there was no randomization, it may have contributed to increase intrarater reproducibility. Previously published studies [11, 14, 16, 24, 28,29,30,31,32,33] reveal that positioning the transducer in the submental region is most often used to measure and analyze hyoid bone displacement. B-mode US of swallowing is a method that enables real-time visualization of the hyoid bone and its displacement [31]. Hence, it precisely establishes the duration of swallowing and the trajectory of the hyoid movement. For Sonies et al. [31], this method can help characterize normal and abnormal movements of the hyoid bone and adjacent muscles during swallowing biomechanics. Similar results are found with videofluoroscopy.

Some limitations are described by the studies [9, 11, 24]. Hsiao et al. [9] referred to the sample size, the unfeasibility of severely dysphagic patients taking the liquid swallowing test, and the difficult contact of the transducer with the skin due to increased thyroid cartilage as limiting factors. Chen et al. [11] also considered their small sample as a limitation, besides having analyzed only men and used software they had developed, which may not apply to every US machine. The limiting factor pointed out by Macrae et al. [24] in their study is that one rater acquired the data while others analyzed them. They suggest that in future studies the same rater acquire and analyze the data; they also report the need for a head stabilizer for a constant point of reference used in the first acquisition.

This review has some limitations. Since this is a systematic review with intraclass correlation coefficients, the quality of evidence could not be assessed because no validated critical assessment tool was found that would fit this type of methodological design. The risk of bias, on the other hand, was analyzed with a tool validated for reliability studies (QAREL) [22]. The lack of some important information to analyze the risk of bias and characterize the study was likewise a limiting factor. Thus, we take a critical look at the results, instead of generalizing them to all people, food consistencies, and assessment clinical conditions.

The results presented in this review show the possibility of using US to assess hyoid bone displacement. Some care needs to be taken when using the method in clinical practice: The examiner must be trained, the patient must be well positioned, the transducer must be slightly in contact with the skin, using plenty of water-soluble gel. For continuous patient analyses, a head transducer should be used to ensure precise results. Moreover, a single examiner should be responsible for the acquisition and analysis. If the assessment is made in a stressor setting, the analysis can be made off-line, ensuring more reliable results [9, 11, 16, 24, 26].

Studies with larger, dysphagic, and non-dysphagic populations are needed, with greater methodological control and more standardized protocol descriptions. Moreover, further studies must compare the US method with other validated and trusted examinations for dysphagia diagnosis. Hyoid bone displacement needs to be studied more in-depth and with different food consistencies, sexes, and ages. Lastly, studies comparing the examination applied in treated (research laboratory) and stressor settings (offices, ICUs, and hospital wards) are important for its validation in clinical practice.

Conclusion

The evidence suggests that US examination has good reliability for the assessment of hyoid bone displacement amplitude in swallowing. However, the effect heterogeneity, limitations, and methodological variability of the studies weaken such results. The results showed reliability for off-line image analysis, with the transducer positioned in the submandibular region and the patient in a vertical position.

Further studies are needed to analyze the reliability of this measure in clinical practice, for real-time acquisition and analysis, in larger populations, with different food consistencies, and accurate diagnosis.