Introduction

Laryngopharyngeal reflux (LPR) is the back flow of gastric contents into the laryngopharynx, where it comes into contact with tissues of the upper aerodigestive tract [1]. From an epidemiological standpoint, LPR is one of the most frequently encountered chronic inflammatory conditions of the larynx, affecting 8–20 % of the general population [2], 4–10 % of patients in Ear Nose and Throat (ENT) consultation [3], 1 % of patients in primary care practices [4] and up to 75 % of patients with refractory ENT symptoms [5]. This clinical entity is well known to considerably affect patients’ quality of life, altering sleep [2] and daily activities and reducing the speaker’s communicative effectiveness [6]. Specifically, LPR may concern 50–78 % of the population with voice complaints and 91 % of the cases of voice disorders in the elderly [79]. Indeed, the induced inflammatory reaction caused by the back flow of gastric components into the aerodigestive tract causes the following: (a) hypersecretion in the pharyngeal space; (b) mucous accumulation; (c) a post-nasal drip sensation; (d) throat clearing; and (e) chronic coughing that can provoke choking. Coughing, throat clearing and the direct effects of acid gases can worsen laryngeal lesions, resulting in alterations in the constitution of the vocal folds, contact ulcers, and/or granulomas [10], generating usual LPR symptoms, such as hoarseness, globus pharyngeus, and sore throat [11]. For many practitioners, hoarseness is considered the main symptom, present in 65–95 % of LPR subjects [12, 13]. Hoarseness generated by inflammation and/or vocal fold lesions can lead to functional complaints, including vocal forcing, forcing sensations, vocal fatigue, musculoskeletal tension, and hard glottal attacks. In addition, vocal forcing, throat clearing and cough promote the development of a vicious cycle, maintaining diffuse inflammation and de facto lesions, as well as clinical findings, such as hoarseness. Hence, the utilisation of voice quality to assess the effectiveness of treatment has increasingly been used as an outcome of medical or surgical treatment [14]. Even today, a few scientific studies have assessed voice quality at diagnosis and after treatment, but very few studies have attempted to describe the specific pathophysiological mechanisms underlying the development of communicative disability and, before and after treatment, the resolution of the disorder. Mapping and reporting these functional changes at different time points in LPR disease are obviously important to better understand the vocal and functional behaviour changes occurring after the development of hoarseness. Currently, except for some theoretical evidence, the precise mechanisms underlying the development of voice disorders have not yet been well documented [14, 15].

The aim of this review was to investigate systematically the effects of LPR disease and its treatments on voice quality. First, we conducted an overview of the studies that: (a) evaluated voice quality modifications at the time of the LPR diagnosis and (b) assessed the effect of LPR treatment on the voice. Second, we also attempted to describe better the pathophysiological mechanisms underlying the development of communicative disability. This review was conducted according to the PRISMA checklist for reviews and meta-analysis [16].

Methods

Types of studies, participants, outcomes and interventions

The primary studies group included case control studies to compare pathological patients with healthy subjects at baseline (primary assessment). The secondary studies group included randomised, double-blind trials (RCTs), prospective or retrospective, controlled or uncontrolled studies, and case series with adequate sample sizes (N > 10).

The clinical diagnosis of LPR remains difficult and controversial; there is no stated consensus [14]. For that reason, we wanted to stay as inclusive as possible in terms of the LPR diagnostic method used by the studies. To be included in this review, patients had to have a clear diagnosis of laryngopharyngeal reflux based on:

  1. 1.

    The presence of ENT symptoms for at least 1 month and/or gastroesophageal complaints plus laryngoscopic signs, both optionally described by an Reflux Symptom Index (RSI) > 13 and an Reflux Finding Score (RFS) > 7 [17]; or

  2. 2.

    A positive result using 24-h multiple-probe pH-metry with or without an oesophageal probe, coupled with LPR symptoms and signs. A positive result includes the following: (a) pH ≤4 measured with a probe positioned initially between 1 and 3 cm above the upper oesophageal sphincter; one pharyngeal episode is sufficient regardless of the pH measurement possible with the oesophageal probe; and (b) a 3-point drop in pH resulting in a pH of less than 5. In contrast, we did not exclude studies that did not obtain pH measurements because: (a) it was not yet enabled (cut-off); (b) several episodes of reflux can occur in healthy patients [18]; and (c) intermittent reflux might not occur during the test period, leading to a bias of diagnosis [19, 20].

The authors had to exclude several conditions leading to similar symptoms and signs such as ENT infections in the previous month, addictions to tobacco, alcohol and other identifiable causes of laryngeal symptoms. Ideally, allergic patients were also excluded, but some authors believed that controlled or non-active allergies cannot skew the diagnosis [14]. Publications focusing on singers or children were also not included. The symptom and sign outcomes could consist of clinical questionnaires or simply history/observation taken by the clinician. However, papers had to study the vocal quality of patients using accurate, subjective and/or objective assessments at baseline and/or throughout the treatment. Hence, publications not providing at least one precise datum on the evaluation of voice quality were not included. Concerning interventions, surgical and non-surgical treatments were included. We obtained the agreement of our institutional review board (EH, OM034).

Search strategy

We conducted a literature search to identify all articles about LPR speech characteristics written in English, French and other languages and published between January 1990 and December 2015. The databases used were the Biological Abstracts, BioMed Central, Cochrane, PubMed and Scopus databases. The keywords used were “reflux”, “laryngopharyngeal”, “laryngitis”, “voice”, and “hoarseness”. These words were combined in distinct ways to generate broad research results. In addition, references were obtained from citations within the retrieved articles or in review publications. To avoid multiple inclusions of patients, we checked for the age, sex, author and geographic area whenever these data were available. When patients were described in more than one publication, we used only the data reported in the larger and more recent publication.

Data selection, extraction and analysis

The research protocol and data selection, extraction, and analysis were developed a priori. All of the retrieved references were manually sorted to extract all of the descriptions of patients meeting the diagnosis of LPR. Two independent authors (JRL and PC) screened and selected each study that had database abstracts, available full texts or titles referring to the condition. If the topic of the publication was unclear, full texts were reviewed. The authors were not blinded to the paper authors, their institutions, or the journals, and the abstracts were reviewed without considering the number of LPR patients reported. No publications were excluded on the basis of quality. Review articles were also subjected to detailed analysis to extract any relevant references.

The two authors assessed the articles included for year of publication, the quality of the trial, the methods, and the evidence level. All of the studies were assessed for the following characteristics: (a) the number of patients; (b) inclusion and exclusion criteria; (c) whether randomisation was performed, the adequacy of the process of allocation and the comparability of the groups; (d) the risk of epidemiological bias; (e) the treatment regimen; (f) the follow-up; and (g) the quality of the outcome assessment. Discrepancies were resolved by discussion with senior otolaryngologists (CF and SS). The grade of recommendation (ranging from Ia to V), according to the Oxford Centre for Evidence-Based Medicine evidence levels [21], was determined for each publication. Risk of bias was assessed using the Tool to Assess Risk of Bias in Cohort Studies developed by the Clarity Group and Evidence Partners [22]. For each voice quality evaluation, methodological procedures were assessed for the following characteristics:

  1. 1.

    Subjective voice evaluations: the characteristics concerning the assessment of voice disorders [tools used, practitioner(s) assessing voice characteristics, the use or not of a listening jury and the pronunciations used, i.e., phonetic text, sustained vowels, etc.]; and

  2. 2.

    Acoustic measurements: the software, the vowel choice and duration, the vowel sample number recorded and analysed, the different characteristics concerning the utilisation of the microphone (distance from the mouth, the sound-treated room), and the vowel sample portion on which the acoustical measurements were obtained.

Results

Search results

The database search results yielded 145 relevant publications in Scopus, 137 relevant publications in PubMed, 13 relevant publications in BioMed Central, 2 relevant publications in Biological Abstracts, and 2 relevant publications in Cochrane (Table 1). From these, we selected 25 pertinent references, accounting for 1483 LPR patients and 587 healthy control subjects (Fig. 1). Of the 25 articles, only 5 controlled studies were found following our inclusion criteria for the primary assessment, accounting for 465 LPR patients and 282 healthy controls. Concerning the secondary assessment, we selected 20 articles, including 11 prospective, uncontrolled case series describing 438 patients, 7 prospective, controlled case studies describing 786 subjects, and only 2 double-blind, placebo-controlled studies describing 71 patients. One study was excluded owing to overlapping patient populations [23]. A detailed description of all of the studied papers and the distribution of cases are displayed in Tables 2 and 3. Among the 25 papers, all of them were available in English. The detailed search strategy is shown in Fig. 1.

Table 1 Result of the literature search
Fig. 1
figure 1

Flow chart shows the process of article selection for this study

Table 2 Overview of study designs, patient characteristics, inclusion criterias and assessment tools of the selected case control studies
Table 3 Overview of study designs, patient numbers, inclusion criterias, treatment outcomes, therapeutic procedures and follow-up period of the selected prospective studies

LPR patient characteristics, treatments and follow-ups

A total of 1483 LPR patients were included in the present systematic review. The sample sizes ranged from 13 to 278 subjects. Age and sex were reported in 18 and 22 of 24 studies, respectively, and treatment and follow-up were reported in all of the prospective studies. Fifty-six percent of all of the patients were women, and the average patient age at diagnosis was 49 years (ranging between 18 and 86 years). Four different medical regimens were used, accompanied or not by diet and behavioural changes:

  1. 1.

    Proton Pump Inhibitors (PPIs) once a day or b.i.d.;

  2. 2.

    PPIs in association with other drugs including H2 receptor antagonists (once per day to q.i.b.), and over-the-counter antacids (t.i.b. to q.i.b.);

  3. 3.

    PPIs combined with speech therapy (once weekly, b.i.w.); and

  4. 4.

    Surgical procedures ± PPIs.

The medical treatment duration ranged from 4 to 20 weeks, with an exception for the patients who underwent surgical procedures following by medical treatment (Fig. 2). A surgical procedure associated or not with PPIs, was used in four studies, generally for nonresponders (Table 3). An average latency period of 50.3 weeks between the surgical procedure and the voice assessment was found (range 12–108 weeks). Nine publications reported associations of medical or surgical therapy with diet and lifestyle changes. Two authors reported that they did not give diet and behavioural change recommendations, and 9 did not provide any information about regimens.

Fig. 2
figure 2

Overview of the duration of treatment and follow-up period of the selected studies. The majority of studies have adopted a treatment period of 12 weeks. Studies that evaluate the effect of treatment after long periods involve a surgical treatment

Outcomes of included studies

A variety of methods and tools were used to determine voice quality. In the group of first studies, two studies provided subjective and objective voice assessments, no studies provided only subjective voice assessments, and three publications provided only objective voice assessments. The LPR diagnosis was based on oesophageal pH metry in two publications, RSI and/or RFS was used in one publication, and the presence of signs and symptoms following the clinical experience of the physician was used in 3 publications. In prospective studies, 15 publications assessed subjective and objective items of voice quality, 3 provided only subjective assessments, and 2 provided only objective assessments. The diagnosis was based on the presence of signs and symptoms in the majority of publications (N = 11). Six centres used oesophageal pH metry to detect the presence of gastroesophageal reflux disease (GERD), which, combined with ENT symptoms, led to the LPR diagnosis. Four studies used both the RSI and RFS scales with the Belafsky criteria, while only two studies used 24-h double-probe ambulatory pH metry. Within these 25 studies, 4 patient-based instruments (Voice Handicap Index (VHI), RSI, Composite Laryngeal Score, Vocal Dysfunction Degree) and 4 clinician-based instruments (Grade, Roughness, Breathiness, Asthenia, Strain (GRBAS), RFS, acoustic parameters and aerodynamic measures) were used. All 25 studies reported their outcomes by means of statistics. In terms of subjective voice assessment, 15 articles demonstrated a significant improvement after medical treatment (N = 11), medical treatment and speech therapy (N = 2), or surgical treatment (N = 3). In terms of objective voice assessment, of the 16 prospective studies using objective voice outcomes, 14 prospective studies reported at least one significant improvement after medical treatment (N = 9), medical treatment and speech therapy (N = 2), or surgical treatment (N = 3). All of the details are available in Tables 2 and 3. An overview of the measurement instruments/tools on voice quality used in the selected studies is described in Table 4.

Table 4 Overview of the measurement instruments/tools on voice quality used in the selected studies

Concerning acoustic measurements, microphone use varied from one study to another (Tables 2, 3). MDVP® (Kay Elemetrics Corp., Pine Brook, NJ, USA) was the most frequently used software to measure acoustic parameters (N = 9), following by Dr Speech (Tiger DRS, Inc., Seattle, WA, USA) (N = 2), C-Speech (C-Speech, P. Milenkovic, Madison, WI, USA) (N = 1), Computerized Speech Lab (KayPentax, Montvalle, NJ, USA) (N = 1), Praat (Paul Boersma and David Weenink, Phonetic Sciences Department, University of Amsterdam, the Netherlands) (N = 1), and Speech Studio Software (Laryngograph Ltd., London, England) (N = 1). In all of the publications, the vowel used to measure acoustic parameters was a sustained/a/, but the duration of the sustained vowel and the sample number recorded and analysed varied from one study to another. The duration of the sustained vowel ranged from 2 s to the maximum phonation time. The sample number recorded and analysed ranged from 3 to 4 trials and 1–3 samples, respectively. The portion of the recording on which acoustic parameters were measured varied from one study to another:

  1. 1.

    Most authors did not provide any information about the selected portion (N = 6);

  2. 2.

    The authors chose the two or three central second signal (N = 4);

  3. 3.

    Acoustic parameters could be measured on the entire signal samples (N = 3); and

  4. 4.

    Others determined the most stable portion of the signal on which they measured the acoustic parameters (N = 2).

Many studies did not provide information about the software used, the sample analysed and/or recorded and/or the microphone used, and/or the sample portion choice on which acoustic measurements were performed.

Methodological quality of the selected studies (evidence level) and bias

All of the publications were evaluated for study design, quality, and level of evidence. Concerning the grade of recommendation, our search found 16 trials with a IIb evidence level, 7 studies with a IIa evidence level, and only 2 RCTs with a Ib evidence level. Regarding the risk of bias, sampling bias was present in 6 cohort studies following the inclusion of patients, based on the requirement to have oesophagitis or demonstrated GERD [13, 2428]. The department, i.e., gastroenterology or otolaryngology, in which the patients were recruited also constituted a bias, given the different profiles of patients (with or without GERD). In two studies, it was not clearly stated whether patients underwent classical pH monitoring, double probe pH monitoring or oropharyngeal pH-metry [29, 30]. In addition, some controlled studies did not provide groups for comparability analysis [29, 3135]. The systematic use of outcomes references to medical record and patients’ self-reports also indicated a higher risk of bias [27, 36, 37]. An observational bias was also not excluded following the material used and the quality of pictures to assess RFS (laryngoscopy without stroboscopy). Finally, the main bias of studies assessing acoustic parameters remained the heterogeneity of the methods used for measuring acoustic parameters [23, 35].

Discussion and evidence synthesis

Our systematic review included 24 publications from 1990 to 2015, covering a period of 25 years. The most publications were published within the last 15 years, indicating a lack of interest in voice outcomes to measure the effectiveness of treatment in LPR disease and GERD. Another explanation is that it has only been in the last two decades that physicians have realised that laryngopharyngeal reflux is a different clinical entity from GERD [19]. Developments in medical technology and increased availability to clinicians might also explain the large number of studies in the last decade.

Subjective voice assessments

First, our review reports that, at baseline, LPR patients can present significant subjective voice disorders compared with healthy subjects. Patients would perceive their voices as unusual, while clinicians would perceive them essentially as hoarse [13, 31]. In daily life, voice disorders involve quality of life alterations, such as described in VHI scores, which seem pejorative, especially in the physical domain among LPR subjects [13, 38]. Although few studies have used this tool, the VHI score seems to decrease after fundoplication [28] or with PPI treatment [25, 39] enhanced by the combination of PPIs and speech therapy [40]. Particular attention should be paid to the patient’s mental condition when interpreting the results of this score because it has been suggested that anxiety and depressive symptoms can influence subjective responses to quality of life and symptom scales, such as the emotional VHI scores [41, 42]. Moreover, Elam et al. suggested that patients with a high VHI emotional score should undergo screening for depressive symptoms [42]. Only one study assessed VHI and the well-being of patients [25]. Among the arsenal of subjective voice tools used by practitioners, the GRBAS remains the standard clinical scale. As mentioned above, hoarseness has been unequivocally recognised as closely related to reflux for decades and remains the main subjective voice disorder in LPR patients [19, 43]. As demonstrated in this review, hoarseness is naturally one of the most subjective voice outcomes used. In most studies, hoarseness improved with PPI treatment [36, 37, 44]; the improvement was better if PPI treatment was combined with speech therapy [35, 40]. However, note that the utilisation and the performers of the evaluation remained different from one study to another, complicating cross-study comparisons and represent an important bias. In resistant LPR subjects, classical surgical antireflux procedures seemed to improve hoarseness positively at 1 year after surgery [45]. Thus, it appears that, except for strain, all of the characteristics of the GRBAS scale also seemed to improve after PPI therapy and even more when PPIs were combined with speech therapy. This finding could be largely due to better control of the source of irritation, i.e., gastric juice, and the effects of speech therapy on vocal dysfunctions, which develop after hoarseness related to LPR [35]. The authors provided two possible explanations. First, LPR disease induced perceived voice disorders, such as hoarseness, leading to subsequent excessive muscular tension. They emphasised that strain might be due to excessive constriction of the laryngeal musculature and perceived breathiness due to abnormalities in vocal fold adduction. According to these considerations, speech therapy could reduce vocal hyperfunction and promote muscular relaxation, thus reducing forceful vocal fold adduction, while PPIs controlled gastric irritation. Second, they explained that the increases in strain and breathiness perception in voice quality could be due to potential interrelationships between categorical variables, such that a decrease in the perception of one variable caused an increase or a decrease in the perception of another. This explanation was supported by Millet and Dejonckere, who reported that the presence of a strong breathy component influenced the rating of a rough component and vice versa [46]. Unfortunately, very few details have been provided in studies concerning speech therapy protocols; moreover, the duration and intensity of the therapy vary from one study to another. Other voice disorder characteristics, such as musculoskeletal tension, hard glottal attack, glottal fry, restricted tone placement, chronic or intermittent dysphonia, vocal fatigue, and voice breaks, were studied in one publication [31] and could also lead to contact ulcers caused by improper voice use, itself due to LPR hoarseness [47]. Further investigations are needed to study these perceptual characteristics [48].

Acoustic measurements

It is well known that subtle voice changes can be even more difficult to detect by the usual subjective assessments of clinicians. Thus, many studies have used acoustic parameters to study pathophysiology or to measure the effectiveness of treatment. They can be non-invasive, accurate and powerful measurements of voice quality, but only if used in conjunction with other measurements, such as perceptual ratings [49]. Our review reported that 16 publications used acoustic parameters to measure the effectiveness of treatment (Table 3); the majority showed significant acoustic improvement after treatment. Jitter and shimmer provide objective information because they are closely to with the stability of mucosal movement of the vocal folds, influenced by the symmetry of the vocal cords, airflow and the amount of mucus [50]. More precisely, they are the main measurements that reflect instability in vocal fold vibrations [51]. In our primary studied group, three authors found a significant difference in jitter values in female LPR patients compared with controls [13, 32, 33], while Ross et al. did not find an objective, significant difference between groups [31]. The unexpected results of the study by Ross et al. could be due to recruitment bias, in which many of the patients recruited had muscle tension dysphonia, which initially indicates vocal lesions and speech fatigue not directly caused by LPR. Many prospective studies using jitter as a treatment outcome (Shaw et al., Lechien et al., Jin et al.) showed an improvement in the different jitter values after medical treatment [23, 38, 52], medical treatment plus voice therapy (Vashani et al. [35]) and surgical procedures (Ogut et al. [26]). Hamdan et al., Sereg-Bahar et al. and Selby et al. were not able to demonstrate an improvement in jitter after medical treatment ± speech therapy [24, 39, 53]. It must be noted that the mean follow-up of the study of Shaw et al., Jin et al. and Lechien et al. (16.3 weeks) was longer than studies that did not observe significant changes (7 weeks) [24, 39]. It is not excluded that significant results could appear with longer follow-up times [54, 55]. In addition, the study of Selby et al. was characterised by a small number of patients, leading to a limited analysis of treatment effect [53]. Concerning shimmer, a majority of results showed that LPR subjects seemed to have alterations in the short-term perturbation of the intensity compared with controls [32]. Moreover, most authors showed shimmer value enhancement after medical treatment ± speech therapy (Shaw et al., Vashani et al, Lechien et al, and Sereg-Bahar et al.) [23, 35, 38, 39] or surgical procedures (Ogut et al. [26]). Like Hamdan, the study of Selby et al. also did not find improvement in shimmer value [24, 53]. The previous remarks concerning the short follow-up time of these studies could be reformulated in this case. As encountered in many other diseases, abnormal values of jitter and shimmer might be correlated with the presence of hoarseness but currently, no studies have reported this finding in LPR disease [38]. When we considered acoustic measurements as treatment outcomes, there was very little evidence in the few studies available about the superiority of a treatment combining speech therapy and PPIs [40]. The last category of acoustic parameters concerns the signal noise measurements, which consist of an average ratio of the unharmonic energy to the harmonic spectral energy [56]. Akylidis suggested that voice turbulence index (VTI), a parameter correlated with the turbulence caused by incomplete or loose adduction of the vocal folds, might reflect mild changes in the vocal fold mucosa earlier than other acoustic parameters [33]. These authors also reported that noise-to-harmonic ratio (NHR) values did not differ between controls and male LPR patients, unlike in female patients, in whom there was a significant difference, suggesting a sex difference in the sensitivity of the laryngeal mucous resistance [33]. In the same manner, Oguz et al. did not find statistically significant differences in NHR values between groups [32]. Furthermore, it would appear that various treatments used substantially improve signal noise measurement values [26, 28, 35, 39, 52, 53]. Only two studies did not find signal noise improvement after medical treatment [2438]. Other acoustic measurements were reported in a few publications [57, 58] but were were undeveloped in the LPR disease.

An explanation for the potential observed improvement in acoustic parameters after medical treatment involves various pathophysiological mechanisms, which are not yet clear. In a theoretical manner, the most possible negative factors altering the periodicity and intensity of the vibration cycle are the factors modifying the biomechanical properties of the margin of the vocal folds [5961]. In LPR disease, any such alteration is due to an inflammatory reaction caused by potentially noxious materials, including gastric acid, pepsin and pancreatic enzyme irritation. Based on a few disparate studies, we suspect that the promotion of irregular vocal fold vibration occurs secondary to the combination of various conditions, including dryness of the vocal folds, keratosis, thickening of the epithelium, alterations of the Reinke space (such as Reinke space dryness, not necessarily oedema) and, in some cases, ulcerative lesions and granulomas [8, 6265]. Another mechanism explaining these findings concerns a possible muscular hyperfunctional effect secondary to a surface inflammatory reaction. Genetically, we are not all equal in our vocal fold tissue response to gastric aggression. In practice, some alterations (i.e., moderate or severe Reinke oedema, ulcerative lesions or granulomas) are easily visible with videolaryngostroboscopic studies, but others are more difficult to objectify (i.e., epithelial thickening, epithelium and Reinke space dryness). Based on theories claiming that jitter and shimmer values could significantly increase with vocal fold oedema [61], oedema of the vocal folds and incomplete glottal closure were presented as the most important findings leading to the deterioration of the pattern of phonation, but in light of the small number of robust trials on the subject and the little consideration given in recent papers to the biomolecular composition of the vocal cords [66, 67], we believe that this hypothesis has not yet been sufficiently confirmed [38]. In addition, even with the most advanced videolaryngostroboscopic techniques, it remains very difficult to assess and stratify oedema of the vocal folds.

Nonetheless, our study demonstrated several notable differences in the methods used to measure acoustic parameters. In a broad and general manner, these various results in shimmer, jitter, and other acoustic values that might undeniably reflect the different approaches used in the literature, described in Tables 2 and 3. Indeed, the results of the acoustic measurements depend on the type of vowel recorded, the duration of the analysed segment, and the method of choosing the selected interval [57, 68, 69]. While some authors have measured the acoustic parameters on the centre of vowel production [33, 40, 70], others have selected the most stable portion in an objective manner [38, 52]. As demonstrated above in various reports, these differences in methods influence the final results and make comparisons very difficult [57, 68, 69]. Moreover, as described in the tables, the method of acoustic analysis was not explained in many papers. In addition, the different therapeutic schematics used could complicate comparisons between studies. In the majority of our review articles, we found that the patient inclusion criteria differed between studies, which could be considered potential sampling bias. A few studies reported the existence of different patient profiles leading to differences in treatment and outcome responses. PPI therapy was more effective in patients with GERD and LPR than patients with LPR without GERD [7173]. Moreover, Shaw et al. and Lechien et al. showed significant differences between LPR patients with hoarseness and patients without/with low hoarseness [23, 38]. In everyday practice, we have also found that there are two profiles of patients depending of the presence of hoarseness, with less hoarse patients in the mild LPR group [68]. The LPR grade is also important to consider, given that mild LPR subjects are not often associated with severe laryngeal pathology. It has been suggested that the condition of patients with mild LPR disease could improve without medical treatment but only through lifestyle and dietary changes [74]. These differences in the behaviour of various patient profiles in response to medical treatment have not been considered in most studies, leading to bias in the interpretation of the results of the trials, i.e., treatment and outcome efficacy. For example, the results of the Selby study could be explained in part by the over-representation of mild LPR patients who did not have laryngeal lesion sufficiently severe to influence frequency and intensity measurement pre-treatment [53]. Thus, it is extremely difficult to generalise the results obtained in most studies because the methodology significantly impacts the results. Other factors were left unmentioned, such as both smoking and alcohol use, which can influence acoustic measurement through induced pharyngo-laryngitis.

Aerodynamic measures

Finally, aerodynamic measurements have been studied in a few publications [24, 28, 30]. It has been suggested that MPT could be affected by LPR disease via two mechanisms: first, by the bronchial irritation caused by LPR and second, by the incomplete adduction of the vocal folds due to inflammatory reactions [75]. Rather, based on theories of phonation [76, 77], we believe that the decrease in MPT is primarily due to inflammatory tissue changes (i.e., Reinke space dryness), leading to an alteration in the vibrating margin flexibility of the vocal folds, as described in other vocal diseases [78]. The controlled studies of Pribuisiene and Kumar tended to confirm the reduction in MPT that in contrast to the study of Wan et al. [13, 30, 79]. However, Sahin et al. did not find significant differences in MPT between GERD and LPR patients after fundoplication [28]. The study of Hamdan et al. reported an insignificant improvement in MPT after medical treatment [24]. This small number of trials studying aerodynamic measurements contributed to limiting the analysis of the use of these outcomes.

In a holistic way, this review also highlights an important heterogeneity between studies concerning the diagnostic methods used. Some authors are satisfied with a symptoms and signs assessment based on their clinical knowledge [23, 35, 70] while other used common tools i.e. questionnaires [27, 28, 38, 40] and/or pH metry [25, 26, 29, 79]. To date, none of these methods/tools seem to commonly accepted given the imperfections that characterise the various studies using them [19]. These methodological differences come from the current controversy on the diagnostic method and limit the interpretations of the results of this review. Another aspect limitating the elaboration of conclusions concerns the disparities between the different studies concerning the treatment regimen. Few studies have not considered the existence of a placebo effect [80] or potential effect of dietary regimen on the LPR symptoms improvement [81].

Conclusion

This review had a number of limitations. Most importantly, the quality of the included studies was modest in most cases. The majority of publications suffered from having a small number of patients, and they did not provide full information concerning the profiles of LPR. Only two publications had a high level of evidence [37, 70]. Second, the techniques for establishing diagnoses in individual patients varied from one study to another, which might have generated bias in the comparison of results, given the selection of profiles of patients who responded differently to treatment. Third, in some papers, it seemed that LPR patients were not followed for a sufficient period of time to observe significant treatment effects in some measurements, requiring more time to improve [24, 25, 35]. Fourth, the myriad of treatment regimens and the limitations in length of follow-up did not favour comparison between studies and did not allow us to draw clear conclusions about the role of each drug in improvement of LPR complaints. Finally, our review reported various notable findings. First, voice quality seemed to improve after treatment in most studies, regardless of the voice evaluation approach and tools used. These observations could indicate that subjective and objective voice quality outcomes could be interesting measurements to show the effectiveness of treatment over time. Among the objective measurements, acoustic parameters seem to be interesting as treatment outcomes in LPR patients, and under certain conditions, they could be used to understand better the LPR pathophysiological mechanisms underlying the development of communicative disability. Nevertheless, the methods used to measure acoustic parameters were not standardised, leading to bias and imperfect conclusions in the sensitivity of the acoustic parameters for measuring the effectiveness of treatment. More studies are necessary to standardise the method of diagnosis and treatment regimen. Considering the changes in voice quality in LPR patients, further prospective studies are needed using a standardised multi-dimensional assessment of voice quality, including subjective, acoustic and aerodynamic assessments.