Introduction

During the swallowing of a liquid, the tongue controls the liquid in a bolus in the oral cavity, changing shape, position, and constriction to transport it into down into the pharynx. In the oral preparatory phase, the extent of oral manipulation of the bolus varies greatly. When placed into the mouth, a liquid bolus has a certain degree of cohesiveness that may be maintained, as the bolus is held between the tongue and the anterior hard palate in preparation for the pharyngeal swallowing [1]. During the oral phase, the tongue causes the posterior movement of the bolus. Lingual movement propels the bolus posteriorly, and the buccal musculature ensures that material does not fall into the lateral sulci [2].

There are various methods for tongue movement measurement during swallowing: videofluoroscopy [3], magnetic resonance imaging [4], and ultrasound [5]. Videofluoroscopy is considered the gold standard for evaluating swallowing; however, there are risks associated with radiation exposure [1]. Ultrasound is simple, repeatable, and its dynamic soft tissue imaging may make it superior to other modalities for swallowing research. The process of swallowing can also be easily seen in real time, or the stream of images can be recorded for later analysis.

B-mode (2D) ultrasound has also been used to assess lingual propulsion during oral bolus transport. However, it has been more common for M-mode traces, which can represent sequential movements of the tongue dorsum during swallowing as display of linear distance against time, from which quantitative measurements could be made [6]. Peng et al. [7] divided the swallowing process into five phases based on specific inflection points identified on M-mode traces. However, it has been not clearly used in the quantitative assessment of tongue movements.

Given recent advances that enable the capture and analysis of high-speed B-mode ultrasound [8], we aimed to test this hypothesis and measure certain spatial and dynamic aspects of the swallowing in a consistent manner across participants, using B-mode images captured from a probe located on the submental surface scanning the mid-sagittal plane. Moreover, ultrasound imaging was explored for possible use in the qualitative evaluation of tongue movements associated with different volumes and viscosities.

Method

Participants

Eleven healthy adults (two men and nine women) between the ages of 19 and 35 years were enrolled for participation in the study. All participants completed a health questionnaire and reported no history of swallowing difficulty, neurological conditions, or gastrointestinal disorders. The data to be assessed were collected upon obtaining ethical approval following standard procedures at the Queen Margaret University. Informed consent was obtained from all participants.

Stimulus Characterization

The thickened liquid chosen for use in this study was a commercially available syrup-thickened apple juice (Thickened Drinks, Resource Brand, Nestlé Nutrition, UK). The thin liquid was Apple Juice (Cloudy Apple Juice, Copella, PepsiCo, USA). Liquid bolus volumes of 10 and 25 ml were administered at room temperature.

Protocol

The participants used a cup, and the task order was randomized. The subjects were studied while they were sitting. The subject held the bolus in their oral cavity and swallowed it 3–5 s later, during the noninstructed drinking. There were three swallowing tokens for each of the four volumes, and viscosities were sampled for a total of 12 swallows per subject.

Producers

Ultrasound data were acquired using an Articulate Instruments EchoB system remotely controlled via USB from a personal computer running Articulate Assistant Advanced (AAA) software version 2.16 [8]. The echo return data were recorded at the rate of 54 frames per second (fps) with 127 beam-formed echo pulses evenly spread over a 104° field of view (FOV) providing a resolution of approximately 0.8°, so when the tongue’s surface is 5 cm from the probe, the distance between the scanlines is under 1 mm. The depth setting was 90 mm, and the resolution along each scanline is well below 1 mm. Standard graphical interpolation was performed on the raw data to convert it to an image for analysis in AAA, similar to the image processing that is normally carried out within a dedicated ultrasound scanner. The low-frequency convex transducer frequency operated at 4 MHz. The participant was fitted with a headset to stabilize the ultrasound probe (Fig. 1).

Fig. 1
figure 1

Picture of the recording: wearing a head set

Tongue Curve Spline Drawing

A measurement fan with 42 radii is provided by AAA software. Thus, the control points (knots) of the tongue curve spline are constrained to lie on the 42 radial axes of a fan grid. The location where the tongue-surface spline intersects each radial axis, becomes a control point that can be set automatically with the AAA edge-detection algorithm that detects dark–bright discontinuities in the image, and smooths the spline. The search space had an upper limit set to traces of the hard palate on sequential video frames (i.e., at about 30 ms intervals) [9] Errors were corrected manually by adjusting the search space or the spline itself. In this way, a spline was fitted to the visible surface of the midsagittal tongue in every relevant video frame. The hard palate was visible in certain frames either due to the ultrasound scan passing through the bolus and reflecting off the boundary of bolus and hard palate, or, when the tongue is in contact with the palate, by being reflected off the tongue’s surface (which has adopted the shape of the hard palate).

The tongue’s surface was traced in AAA from the start time, at which time the tongue moved up toward the palate at the start of swallowing, to the end time, when the entire tongue was in contact with the palate (cf. four points on the continuum in Fig. 2).

Fig. 2
figure 2

The tongue’s surface was traced during swallowing in the mid-sagittal plane (visualized with AAA). The first frame shows the bolus between tongue blade and hard palate

Three-Dimensional (3D) Waterfall Diagrams

In AAA, 3D waterfall diagrams were illustrated from the tongue splines fitted to the tongue’s surfaces on sequential video frames from the start time to the end time.

Distance from the Tongue Surface to the Palate (Depression Depth)

For each frame (i.e., time point), the distance (mm) from the tongue’s surface to the palate was exported, measuring along the radial fan lines (the arrows in Fig. 3).

Fig. 3
figure 3

Schematic illustration of the distance (in mm) was calculated using AAA software, measuring along each radial fan line from the point where the tongue’s surface to the point where the hard plate intersected the fan line in each individual plot

Tongue Depression Depth

Each frame’s tongue spline contained a depression, or concavity in the tongue’s surface, like a cup. The size of this depression and its location was defined using the greatest radial distance from the location of the hard palate to the visible tongue’s surface (the longest arrow in Fig. 3) as a proxy measure of its depth.

Statistical Analyses

Data were entered into a spreadsheet and imported into R (version 3.1.2) for further analysis. To address questions regarding variations in volume and viscosity, repeated-measures analyses of variance (ANOVAs) and Bonferroni’s post hoc test with a within-participant factor of clear swallow were performed. To address the question regarding variation in the number of radial fan lines, repeated-measures analyses of variance (ANOVAs) and Bonferroni’s post hoc test with a within-participant factor of clear swallow were performed.

Results

Tongue Curve Spline Drawing

Tongue Splines on Sequential Video Frames

There were considerable differences between the individual participants’ tongue splines on sequential video frames. Figure 4 shows the most commonly displayed splines for each participant while swallowing. With 10 ml of thin liquids, we were able to find and measure the depression depth for all participants. With 10 ml of thickened liquids, we were not able to measure the depression depth for seven participants. Measurements could not be obtained for four participants with 25 ml of thickened liquids and for two participants with 25 ml of thin liquids, and one participant had unclear splines (Figs. 5, 6).

Fig. 4
figure 4

22-year-old female. Overlaid tongue curve splines (top) for four bolus types, and 3D time series (below). 25-ml thin bolus. In the 3D plots, the radial depression distance from tongue to palate is shown vertically. The anterior constriction forms first at fan line 10 (right), then the contact spreads back across the palate, in time, to PT20. a Retracting tongue contact with palate surface moves the bolus posteriorly. b Rising tongue’s surface within oral cavity also moves the bolus posteriorly

Fig. 5
figure 5

Unmeasurable types. 19-year-old female in whom the tongue’s surface did not make a traveling concavity and the detected movement was only very slight

Fig. 6
figure 6

Unmeasurable types. 24-year-old female with an anterior concavity at the start and a dorsal concavity later (just before, at the end of the transport, the near-complete closure), but, in between these times, the front/middle of the tongue did not form the clear concavity traveling in a posterior direction as might be expected. This may be because, unusually, she held the dorsal part of her tongue near to or touching the palate at the start of the process

Thus, there were qualitative differences between individual participants, and we defined the “measurable” and “unmeasurable” types quantitatively.

The measurable type was defined as a swallowing spline from which the depression distance could be determined with certainty. Figure 4 shows contact of the tip of the tongue with the palate, and the middle of the tongue formed a concavity groove at the start of swallowing. Afterward, the tongue’s surface from anterior to posterior gradually touched the palate. At the end, the entire tongue was in contact with the palate. This was described as a wave movement. Figure 4 shows the most common type measurable, in which we could find a clear bolus depression on the cupped tongue’s surface.

Unmeasurable Type

The unmeasurable type was defined as a swallowing spline from which the depth (and length) of the depression was unable to be determined. Two types were recognized in this study. Figure 5 shows the most common unmeasurable type in which the middle of the tongue did not form a depression and the entire tongue maintained a cranial position at first. After that, the posterior tongue’s surface was sloped down and lay back on the palate. It was only observed in a few cases in the 10 ml thickened liquids. From the start, the posterior tongue’s surface was declined and did not have a concavity groove or sharp slope. Afterward, the posterior tongue was touching the palate.

Comparing the Depression Depths

To make the best use of the data, only data from the conditions involving 10 ml of thin liquid, 25 ml of thickened liquid, and 25 ml of thin liquid were compared. There were seven participants who had sufficient measurable data in each of these conditions. A total of 63 swallowing sequences were fitted, and the maximum value of the depression (concavity) is reported for each. Figure 7 shows box plots for each condition.

Fig. 7
figure 7

The average maximum radial depression distance from palate to tongue’s surface among 10-ml thin, 25-ml thickened, and 25-ml thin (all measurable types)

There was a significant result of depression depth for volume in the lower viscosity liquid. The average depression depth was 20.9 mm (IQR of 4.3 mm) during the swallowing of 10 ml of thin liquid compared with 24.6 mm (IQR: 3.3 mm) during the swallowing of 25 ml of thin liquid (p < 0.001). Moreover, There was a significant result of depression depth for different viscosity liquid. The average depression depth was 22.3 mm (IQR of 4.7 mm) during the swallowing of 25 ml of thickened liquid compared with 24.6 mm (IQR: 3.3 mm) during the swallowing of 25 ml of thin liquid (p < 0.01).

The mid-sagittal length of the depression was also measured, in terms of the number of fan lines on the radial measurement grid on which the depression could be observed.

There was no significant difference in the number of fan grids. The average number of fan grids was 17.4 (IQR: 2.2) for 10 ml of thin liquid, 15.9 (IQR: 3.3) for 25 ml of thickened liquid, and 18.0 (IQR: 2.8) for 25 ml of thin liquid (Fig. 8).

Fig. 8
figure 8

The number of fan grids that formed a concavity groove according to the tongue’s depression distance

Discussion

Ultrasound tongue imaging (UTI) captures spatial aspects of swallowing as they change in time. Dodds et al. [3] reported that swallowing is divided into four phases: the preparatory phase, oral phase, pharyngeal phase, and esophageal phase. During the preparatory phase, the bolus is sized, shaped, and positioned on the tongue ready for swallowing, and during the oral phase, the bolus is propelled from the oral cavity into the pharynx. Thus, ultrasound is well suited to analyze the preparatory and oral phases because the floor of the mouth is an accessible and relatively small area, and ultrasound is able to provide valuable information about the changing location and shape of the tongue’s surface. This could be applied to other types of data, such as magnetic resonance imaging (MRI), computed tomography (CT), and other techniques. Moreover, given the frame rate of the UTI system described, and the wide field of view, the recording of almost complete tongue movements was achieved. The linear measurement of the depression depth allows a quantitative description of the tongue curve spline on B-mode imaging [10].

In this study, we used UTI to observe whole-tongue movements during swallowing in normal adults and recorded swallowing in each phase to determine common features for measurements or data normalization.

Test Liquid Swallowing

Free swallowing was performed because with directed swallowing, the amount of liquid in each mouthful is decreased [11]. In addition, the amount of liquid used in the present study was somewhat larger than that used for swallowing in research generally [12]. In free swallowing, one mouthful of fluid water amounts to 30.87 ml in women and 34.48 ml in men. In contrast, approximately 7 ml of water, 6 ml of fluid apple juice, and 4 ml of viscous apple juice can be swallowed with directed swallowing using a cup. Consequently, our experiment compared free swallowing of 25 and 10 ml of water.

In our research, the swallowing type in which a depression was created was more commonly observed. This observation is similar to that of Kahrilas et al., who reported that liquid boluses are initially held in a chamber along the midline groove of the tongue during the oral phase of swallowing [13]. Subsequently, as reported in another study [14], the tongue of the subjects compressed the bolus against the palate and squeezed it backward in a conveyor belt-like fashion. However, our investigation specifically revealed that a greater depression depth is used for larger volumes and for more fluid liquids.

Qualitative Types

In our study, we divided swallowing patterns into two types based on qualitative analysis: measurable and unmeasurable types. Detailed measurements were taken for measurable type responses. However, the tongue’s movement patterns in swallowing for unmeasurable responses may require verification because dysphagia patients may experience many tongue’s movement difficulties in the oral preparatory phase. These patients may be characterized by oral limitations resulting from lingual weakness. We considered that unmeasurable images may indicate slight movements (Fig. 5) that do not reach the level of traveling concavity seen in Fig. 6. Therefore, we must develop a qualitative analysis system for unmeasurable responses.

Volumes

As the liquid volume increases, the depression depth increases. Hamlet et al. [12] reported evident trends that larger volumes were associated with a greater extent of grooving. Moreover, another investigation found a consistent trend that subjects used larger movement excursions and speeds when swallowing larger bolus volumes [15]. These findings suggest that larger liquid volumes are linked to dynamic movement. Another study found that maximum tongue grooving occurred near to the onset of lingual propulsive activity. Moreover, videofluoroscopy was used to analyze the pattern of tongue base activity along the central groove associated with differences in bolus volume [13]. Similarly, we observed that for most subjects, swallowing occurred after the depression distance increase.

Viscosity

Thickened liquids are frequently used as an intervention to compensate for a poor oral control and aspiration of fluid liquids [16, 17]. Nonetheless, little research has been performed on the actual tongue’s depression depth during swallowing. The results of our research indicate that the depression became significantly deeper for nonfluid liquids than for liquids with an agglomeration quality. Such agglomeration limited the spread of the liquid, and this characteristic likely affected the depression depth. In addition, according to a report by Miller, when there is no viscosity, a bolus is formed, and no great force is required for the initial transfer. In contrast, other reports state that viscosity requires some degree of force [18,19,20]. That is, for viscous liquids, propulsive tongue’s movements are required, and this movement likely has an effect on the depression depth as well. A larger depression is used for drinking larger amounts in general, although X-ray imaging research has found individual differences in depression depth [15].

Some of the performances of the subjects in our study could not be measured due to swallowing in a humped or flat manner, without creating a depression. Future studies should investigate the prevalence of such cases.

Limitation

The present study used only a UTI system for simple tongue shape analysis. UTI is good at showing the tongue’s surface and bolus, and less than good at revealing contact between the tongue and hard palate. This makes it difficult to define a temporal reference point comparable to other approaches in the literature. However, electromyography (EMG), electropalatography (EPG), or MRI may provide better means of detecting tongue’s position in the oral cavity, including the time point at which the tongue makes contact with the hard plate.

Conclusion

Our research revealed that subjects touched their tongue to their palate during drinking. We also observed that after intake, grooving precedes swallowing. In addition, fluid liquids tended to leave fewer traces, although an exhaustive investigation was not performed at this time. Nonetheless, this observation was hypothesized, as a previous study found that viscosity slows bolus movement.

Future studies should use our methods to investigate swallowing speed. Our results already indicate that UTI can capture spatial aspects of swallowing. Initial observations also suggest that it will be possible to discuss and explain the dynamics of the tongue’s constriction and the movement of the constriction along the anterior to posterior plane. In time, as the costs reduce and the spatiotemporal resolution of 4D ultrasound improves, it will also become possible to capture volumetric data, but currently, we recommend further research using mid-sagittal high-speed ultrasound.