Introduction

Oropharyngeal dysphagia may be considered as any alteration to the swallowing capacity characterized by one or more alterations in food and liquid transit from the oral cavity to the esophagus [1, 2]. Videofluoroscopic Swallow Study (VFSS) is classified as the gold standard in the diagnosis of oropharyngeal dysphagia, allowing identification of the presence of residue along the oropharyngeal tract and the entry of material in the larynx and lower airway, an important risk factor for aspiration pneumonia and malnutrition [1]. Nevertheless, the examination is still difficult to access in some health services and there can be a high cost to the patient. Thus, the clinical evaluation by a specialized professional is the diagnostic method most frequently used in clinical practice [1, 3, 4].

Given the role the larynx plays in protecting the airway during swallowing, and generating sound for voice, some clinical protocols test for pre–post-swallow voice quality change as an indicator of airway protection concerns for persons with dysphagia [3,4,5]. It is believed that the presence of material retained in the pharyngolaryngeal tract may act as a barrier for the propagation of vocal resonance, given that this change is not observed in individuals with normal swallowing [5, 6]. The characterization of this voice change is variably described in clinical protocols, but typically considers vocal quality parameters such as degree of change (from pre-swallow condition), vocal intensity, and use of the term “wet voice” as a distinct parameter [3, 5, 6]. The term wet voice has been the main term used within the field of dysphagia to characterize change and is described as a sound with a bubbling aspect during post-swallow voicing [4, 7]. Nevertheless, vocal assessment methods for detecting dysphagia are poorly studied and have insufficient evidence for indication as a reliable method.

In the field of voice, a traditional voice evaluation includes auditory-perceptual, aerodynamic, acoustic, and imaging parameters [8]. The auditory-perceptual evaluation is based on training by professionals in the field and quantified using standardized and calibrated measurement scales such as the GRBAS, and the classification of labels such as the attribution of a wet voice [5, 7,8,9]. In addition, acoustic parameters are used to evaluate physical aspects of sound during phonation [8, 9]. The obtainment of the vocal production for both analytic methods is carried out from the continuous and sustained emission of a vowel, usually /a/, counting of numbers, and reading of texts to obtaining voice samples. This allows for analysis and comparison of voice production variabilities [8], methods also used for the assessment of production variability after swallowing [5,6,7]. Due to the variability of voicing in the presence of material in the phonatory tract, it is necessary that different assessment approaches with high reliability be tested in order to improve the accuracy of the methods used [4].

Although the evaluation of vocal change is used as one of the indicators of oropharyngeal dysphagia in clinical protocols [3, 4], studies are controversial regarding the diagnostic accuracy relative to the findings of the VFSS [3, 5,6,7], especially when time linked audio and video data are not available.

The PRISMA-DTA guideline [10] provides guidance for reporting systematic reviews of diagnostic test accuracy studies. When comparing different diagnostic methods, it is important that the diagnostic alternative, named index text, be able to truly identify sick and healthy individuals in a reliable way, when compared to the gold standard, classified as reference standard, and maintaining the same assessment parameters and outcomes investigated.

The aim of this study was to perform a systematic review of published literature examining the relationship between post-swallow voice change and documented swallowing deficits.

Methods

This systematic review was conducted from the guidelines proposed by the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy and reported according to the PRISMA recommendations [10]. This study’s protocol was published in PROSPERO under register CRD42019131008. The research question of this systematic review was: “Is vocal quality after swallowing an accurate method for detecting oropharyngeal dysphagia?”.

Criteria for Including Studies in This Review

Types of Studies

We included studies that used evaluation methods of phonation compared with results of VFSS, with designs for diagnostic accuracy analysis or observational studies (cross-sectional and cohort studies).

Participants

Patients who underwent evaluations of the swallowing performance from the VFSS, as long as they did not have a tracheostomy in situ or had surgical resections of the phonatory tract structures.

Index Test

For vocal quality evaluation, the vocal sample collection methods from sustained phonation or connect speech performed after swallowing liquid, pasty, or solid food samples were considered. For data analysis, we used acoustic outcomes and auditory-perceptual changes.

Reference Standard

All patients were referred to VFSS for swallowing performance evaluation. Among the variables for conducting the examination, food and liquid consistencies ware evaluated, given that swallowing performance may be affected by it [5].

Target Conditions

Listed as outcomes of the VFSS for the characterization of swallowing alterations were the presence of material residues in the hypopharynx, larynx and lower airway.

Search Methods for Study Identification

We conducted a search for studies published in the PubMed/Medline, Cochrane, EMBASE, and Latin American and Caribbean Health Sciences (LILACS) databases, using the search strategies presented in Online Appendix I. Aiming to minimize selection biases, the search was complemented by other bibliographical resources from the health field such as Google Scholar, OpenGrey, ProQuest, dissertations, theses, and reference lists. There was no restriction on language or publication date.

Data Collection and Analysis

Selection of Studies

The studies were initially analyzed by their titles and abstracts by two independent evaluators (KWS and ECR), including studies that met the eligibility criteria. In the cases in which there was no clarity regarding the inclusion, a third evaluator was consulted for discrepancy adjustments (RSR). Those included in this stage were read in full for the final decision on their inclusion.

Data Extraction and Management

After full text reading and inclusion of the studies, the data were extracted by two independent evaluators (KWS and ECR) and checked by a third evaluator (RSR), filling out a data extraction form elaborated for the study. In case of doubts or incomplete data, the authors were contacted. The data extracted were those referring to author, publication year, country, study design, participant age, sample size per gender, and base disease as the cause of oropharyngeal dysphagia. Concerning the outcomes, we collected data on the consistency and volumes of the foods and liquids evaluated, VFSS outcomes analyzed, vocal sample methods, acoustic and auditory-perceptual vocal outcomes, and main results described by the authors.

Assessment of Methodological Quality

The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) was used by two independent evaluators (KWS and ECR) to evaluate the quality of the studies. When there was no consensus between the evaluators, a third reviewer (RSR) was consulted.

Results

Study Selection

The described search strategy resulted in a total of 271 studies, of which 73 were duplicated and were excluded. After the removal, 198 titles and abstracts were read by two independent authors, resulting in 27 studies for full reading, without disagreement between the authors, with 17 studies [6, 7, 11,12,13,14,15,16,17,18,19,20,21,22,23,24,25] finally being included in this review. The PRISMA flow diagram (Fig. 1) presents the selection process and the exclusion reasons at each stage.

Fig. 1
figure 1

PRISMA flow diagram of the study

Characteristics of the Included Studies

The studies included were published from 1999 to 2018. The sample sizes varied between 6 [11, 22] and 250 [24] individuals. The predominant base diseases in the studies were neurological problems (Table 1).

Table 1 Characteristics of included studies

Videofluoroscopic Swallow Study

The swallowing function was evaluated from the VFSS, using consistencies of puree [13], liquid [12, 15, 19, 24], or both [6, 7, 11, 14, 16,17,18, 21, 23, 25]. Only two studies [20, 22] did not report the consistencies used in the evaluations. Considering the number of swallows for analysis, five (29.41%) studies [11, 14, 15, 23, 25] considered one single swallow; three (17.64%) [7, 17, 24], two swallows; three (17.64%) [13, 18, 19], three swallows; and six (35.29%) [6, 12, 16, 20,21,22] did not report this information. Three (17.64%) studies [7, 15, 19] reported that the video samples were recorded at 30 frames per second and the others did not report this data.

Regarding the outcomes analyzed in the VFSS, only one study [22] did not consider the presence of tracheal penetration or aspiration, only taking into account the gravity of the alteration. Besides examining the presence of tracheal penetration or aspiration, four studies [13, 18, 19, 25] considered the presence of residues in other regions of the pharyngolaryngeal tract for analysis.

Voice Analysis

The vocal production collection method was obtained predominantly from the sustained emission of vowel /a/ in isolation [6, 7, 11, 13, 15, 20, 21, 23,24,25] or complemented by other emissions [14, 17, 18], with emission from the elevation of pitch [16], non-continuous sound [19], and syllables [22] also being observed. Typically voice samples were collected during sustained phonation /a/ [6, 7, 11, 13,14,15, 17, 18, 20, 21, 23,24,25], pitch elevation [16], non-continuous sounds [19], and during production of syllables [22]. In relation to acoustic treatments of the environment in order to reduce noise, one study [6] reported that the vocal samples were obtained in space separated by a soundproof door from the VFSS room; one [7] reported that was not possible to control this aspect; one [19] used an algorithm to remove background noise of the samples; and the others did not report any aspect of controlling.

The vocal samples were obtained before and after swallowing in seven studies [6, 7, 13, 15, 19, 20, 23] to compare the usual vocal pattern to performance after swallowing, while, in the other studies, the vocal samples were only collected after swallowing. In seven (41.17%) studies [11, 12, 14, 17, 22, 24, 25] the vocal samples were not obtained at the same time as the VFSS. The details are presented in Table 2.

Table 2 Diagnostic methods and results obtained by the studies

Five studies [6, 15, 20,21,22] performed vocal analysis from acoustic methods, seven [7, 12, 13, 17, 18, 24, 25] through auditory-perceptual parameters, and five studies [11, 14, 16, 19, 23] used both methods. Six studies (35.29%) [7, 15,16,17,18, 22] reported intra or interrater reliability measures for voice analysis. Regarding the quantitative data described in the studies there was no clinical and statistical homogeneity, precluding the data aggregation for meta-analysis. To obtain a comprehensive analysis, we performed a descriptive analysis of the results obtained in each study, presented in Table 2.

Regarding the acoustic analysis parameters investigated, the studies were quite variable concerning the parameters researched, with the most used being the fundamental frequency (f0) [6, 14,15,16, 20, 23] and the noise-to-harmonic ratio (NHR) [6, 15, 20]. In six studies [11, 14, 15, 19, 21, 23], no significant vocal changes were identified from the parameters investigated, while three studies [6, 16, 22] identified changes by analyzing different acoustic parameters, yet without accuracy analysis. Only one study identified sensitivity and specificity data over 70% for the relative average perturbation (RAP) and NHR parameters, identifying an increase in the parameter thresholds after swallowing. However, the data found differ from some included studies that did not present statistically significant results [15] or presented only a reduction in the thresholds of those variables [6].

Among the auditory-perceptual parameters, the items of the GRBAS scale were the main parameters analyzed, having been used partially [11, 13, 14, 17, 25] or completely [19, 23] in the evaluations. There is significant heterogeneity of the investigated variables, with no possibility of data aggregation. Two studies [11, 14] did not identify phonation change after swallowing; two others [13, 23] observed an increase in effort, with sensitivity and specificity around 60% for the presence of residues in the pharyngolaryngeal tract and penetration/aspiration; one study [16] found an increase of pitch after swallowing, yet without accuracy data; and another [25] considering any vocal change, with 72% sensitivity and 67% specificity. Considering the wet voice parameter, nine studies included the variable as an analysis outcome [7, 12, 13, 17, 19, 23,24,25], while two [11, 14] mentioned the outcome among their results, yet without descriptions in the method sections.

Most studies [7, 12, 13, 18, 23] did not identify significant changes after swallowing for diagnostic accuracy. Among the studies that conducted contingency analysis [17, 19, 25], low sensitivity (14–50%) was observed, yet with better specificity (78–94%). One of the studies [25] did not allow a specific analysis due to aggregation with other vocal outcomes.

Bias Risk and Evidence Quality of the Included Studies

Figure 2 presents the results of the quality analysis, which is described next.

Fig. 2
figure 2

Quality analysis using QUADAS-2

Patient Selection

Since case–control studies can generate bias in the analysis of data from individuals with alterations, studies with this design were considered to be at high risk of bias in cases where it was not explicitly described that there was blinding analysis as to the diagnosis of swallowing disorders. One (5.88%) study [22] was listed as high risk of bias due to the comparison of individuals with swallowing alterations to healthy individuals without blinded analysis. Regarding the applicability of the data, all studies were considered of low risk, meaning that the profile of the studied patients was in accordance with the objective of the review.

Index Test

As for the risk of bias relative to the vocal evaluation data, five (29.41%) studies [6, 15, 21, 22, 25] presented high bias risks. The high risk occurred primarily due to the absence or lack of clarity as for the blinding of the VFSS data for vocal evaluation, which may significantly compromise the analysis of the vocal samples. Regarding applicability, all studies were considered low risk since they contemplated the proposal of the review.

Reference Standard

In the risk of bias analysis of the VFSS, only one (5.88%) study [22] was pointed out as high risk due to lack of an appropriate description of the methods for performing the examination. Regarding applicability, all studies were considered low risk because they used the examination as the standard instrument for diagnosis.

Flow and Timing

Vocal emission immediately after swallowing may be changed due to the presence of material residues along the pharyngolaryngeal tract. Thus, the visualization of such residues in the VFSS becomes primordial for the accurate analysis between this finding and the vocal change after swallowing. Based on this, seven (41.17%) studies [11, 12, 14, 17, 22, 24, 25] were classified as high bias risk for not performing the collection of vocal samples immediately after the swallowing observed in the examination.

Quantitative Analysis

Ten studies [7, 12, 13, 16,17,18,19,20, 23,24,25] presented accuracy data or quantitative details for the composition of contingency tables for accuracy calculations, however due to the qualitative variability and quantitative heterogeneity of the studies, the data were not aggregated in a metanalysis. Thus, aggregate estimates of accuracy were not reported, while the other parameters of evaluation of diagnostic compatibility, qualitative synthesis of primary studies and descriptive data reported were by the authors following the PRISMA-DTA guideline [10].

Discussion

Most of the studies included in this review investigated the change of vocal production, having as outcomes the use of the penetration–aspiration scale that graduates the presence and responsiveness of the individual to materials that enters the laryngeal airway and/or is aspirated into the lower airway in eight alteration levels [26]. The presence of laryngeal penetration is characterized by there being material contact with the vocal folds without entering the lower airway, while tracheal aspiration is characterized by material entering the airway, with both responses considering the responsiveness of individuals to such foreign bodies. Although these two aspects present distinct alteration levels and functional impacts, most of the studies performed the analysis of the presence of tracheal penetration or aspiration in aggregate, not allowing precise analyses of what vocal change after swallowing may objectively mean.

The absence of a consensus in terms of aspects to be investigated does not allow an accurate evaluation of the vocal parameters investigated. Moreover, it is known that the viscosity and volume of the food and liquid offered present different neuromotor demands that modify the swallowing performance and, therefore, must be considered for the analysis of each outcome [3, 4, 27]. In view of such aspects, the investigation methods are quite heterogeneous to conduct evaluations capable of identifying one or more swallowing function alterations in an accurate and standardized manner from vocal change after swallowing, presenting high clinical heterogeneity in the attempt of clustering the data, confirmed by statistical heterogeneity.

In addition to the absence of standardization of the objectives of the clinical investigations, it was observed that there was no consensual protocol regarding the voice investigation method. Phonation must be observed before and after swallowing to compare if there is a change to the usual pattern, and it is also essential that this marker is observed at the same time of vocal investigation in order to detail the diagnostic accuracy [5, 26], measures that are little adopted in most studies included in this review, increase the risk of study bias. It is known that the dysfunctions observed in each swallowing may not occur homogeneously at all events [27], and, to be properly associated with a vocal change, both alterations must be investigated simultaneously. Such aspects become essential in the accuracy analysis yet were infrequent across these studies.

The vocal evaluation methods, as well as the parameters used, are also not homogeneous among studies. In the investigation of acoustic data, most did not identify parameters that present significant record variability associated with swallowing, and, among those that did, there is a divergence of results [6, 20]. Among the auditory-perceptual parameters, it is observed that phonation is the most homogeneous indicator that points to the presence of material along the phonatory tract after swallowing [13, 17, 23]; however, there are few studies and with low accuracy for the indication as an isolated diagnostic method, also limited by the heterogeneity of the reference examination parameters, as described previously. Particularly regarding the wet voice parameter, there is no standardization for the classification, measurement, and analysis of the outcome, with evidence not being observed with association with the parameters that indicate swallowing alteration [7, 12, 13, 18, 23], with a low probability of detecting individuals with alterations (low sensitivity). However, wet voice was highly specific, detecting individuals without the condition [24].

Some study limitations made it impossible to obtain a consensus regarding the recommendation of the use of vocal evaluation as an accurate method for identifying swallowing alterations, including heterogeneity of the vocal evaluation methods, the outcomes evaluated in the VFSS, heterogeneity in food and liquid consistencies, and the methodological quality of the studies; therefore, the vocal evaluation is not recommended as a reliable parameter of the data obtained in VFSS in the identification of swallowing alterations and dysphagia.

The use of parameters of vocal change after swallowing must be analyzed with caution in clinical protocols to avoid measurement biases. Moreover, there is a need to standardize investigation methods for the adequate accuracy analysis of the parameters investigated, especially those of clinical nature, such as the ones evaluated in this systematic review.

Conclusion

It is not possible to obtain a consensus regarding the recommendation of the use of vocal evaluation as an accurate method for identifying swallowing alterations due to the heterogeneity of the vocal evaluation methods, the outcomes evaluated in the VFSS examination, heterogeneity in food and liquid consistencies, and the methodological quality of the studies.