Introduction

Gastroesophageal reflux disease (GERD) is a common medical condition described as a chronic manifestation of acid exposure to the esophagus which causes a myriad of symptoms sufficient to impair quality of life.1,2 In a systematic review of 77,671 patients, 25% of adults were reported to have an episode of heartburn at least once a month, and 12% had symptoms weekly, while 5% suffered from daily heartburn.3 Typical symptoms associated with GERD are heartburn and acid regurgitation, which are highly specific but not sensitive for the diagnosis of GERD.4 Temporally, acid reflux episodes variably corresponded to GERD symptoms in patients with and without pathologic GERD.5,6 Thus, the correlation between symptoms and pathologic GERD has been variable, at best.

Symptom assessment through standardized questionnaires such as the Mayo-GERD questionnaire (GERDQ) allows patients to self-report GERD symptoms and enables clinicians to assess the impact of GERD-related symptoms on patients.7 Assessing whether GERDQ can predict pathologic GERD is therefore an appealing extension of the use of this questionnaire. Predictive tools need to be concise, yet the design of the GERDQ is lengthy, consisting of 80 questions. To achieve clinical utility of such a predictive tool would require identifying a smaller subset of questions within the longer questionnaire.

Ambulatory 24-h esophageal pH monitoring has been long recognized as a standard for objectively measuring pathological GERD, with high sensitivity, specificity, and accuracy ranging between 84% and 100%.810 The uniform pH scoring system identifies six important parameters as predictors of GERD symptoms.10 In particular, percent total time pH < 4 (distal time (DT)) and DeMeester score (DS) are widely accepted as two key quantitative parameters of GERD. DeMeester score is a sum of component scores of the six individual parameters (of which DT is one parameter). If DT > 4% or if DS ≥ 14.7, the test is considered diagnostic of pathologic GERD.1012

We hypothesized that a specific subset of questions which addressed the typical symptoms of GERD within the GERDQ are highly correlated with the presence of pathologic GERD, as defined by DT and/or DS parameters. The primary goal of this study was to determine whether and which components of GERDQ accurately identified patients with pathologic GERD as defined by 24-h esophageal pH testing.

Materials and Methods

Study Design

The study was approved by the University Health Network Research Ethics Board. The study design was a prospective cross-sectional evaluation of consecutively consenting patients (February 2003 to February 2008) with clinical symptoms compatible with gastroesophageal reflux disease (GERD) who were referred to the Esophageal Function Laboratory at Toronto General Hospital for 24-h esophageal pH monitoring. All were naïve to pH testing. Patients eligible for inclusion were those patients with symptoms of GERD who were referred for 24-h pH testing who could read and understand English. Patients were excluded from the study if they had previous antireflux surgery, had a known esophageal motility disorder, or were under evaluation pre- or post-lung transplantation.

After consenting to participate in the study, patients were given the GERDQ questionnaire to complete over the course of the pH testing. Motility and ambulatory 24-h pH testing were performed as outpatients. Patients were required to fast for at least 4 h prior to the testing. All antireflux medications were discontinued 1 week prior to the pH testing. No restriction was placed on patients’ daily activities, eating, drinking, or smoking habits. Data from GERDQ questionnaires were kept segregated from pH testing data until the time of analysis.

Mayo-GERD Questionnaire

Mayo-GERDQ is a validated self-administered questionnaire designed to measure symptoms of GERD consisting of 80 questions concerning patients’ experience of acid reflux episodes.7 The questionnaire addresses four major primary symptoms of GERD including heartburn, acid regurgitation, chest pain, and dysphagia, of which heartburn and acid reflux had the highest specificity for GERD.4 In addition, GERDQ also asked questions about atypical symptoms, lifestyle, general quality of life, general medical history, and review of other symptoms. Twenty-two GERDQ questions related to cardinal symptoms of heartburn and acid regurgitation were selected for in-depth analyses. These 22 questions either utilized a Likert scale or were Yes/No dichotomized questions and were classified into eight major categories: duration since the first onset of symptoms, frequency of symptoms, severity of symptoms, nocturnal symptoms, antacid medication, duration of antacid administration, history of gastric or esophageal diseases, smoking and drinking habits.

Manometry and Esophageal pH Monitoring

Esophageal manometry was first performed on all patients to identify the level of the lower esophageal sphincter. An eight-channel motility catheter was used to assess lower esophageal sphincter (LES) pressure when the patient was supine. Manometric LES values were identified through ten consecutive swallows with 5 cm3 aliquots of water. The amplitude and activity of peristaltic contraction, the upper esophageal sphincter (UES) location, resting and contraction tone, and coordination were also measured.

After removal of the motility assembly, esophageal pH was measured using a COMFORTEC (Sandhill Scientific), two-channel pH probe with 15 cm spacing. The distal sensor was positioned at 5 cm above the manometrically defined LES, while the proximal one was located at 15 cm above the LES. Esophageal pH was monitored and recorded electronically for a 24-h period. Patients were asked to maintain daily normal activities and diet. Using the GERD pH monitoring device (Sandhill Scientific) and Bioview pH software, the number and duration of reflux episodes were measured. Upright and supine acid exposure times (in percentage) were also calculated.

Statistical Analysis

Summary statistics of demographic variables, GERDQ responses, and pH testing results were generated. Because both DT and DS have been used as the reference for defining pathologic GERD, we included both in our primary analyses. Patients were classified as having either pathologic GERD or not on the basis of the standardized cutoffs for DT(no GERD ≤ 4%; Pathologic GERD > 4%) or DS (no GERD, <14.7; pathologic GERD, ≥14.7). Univariate logistic regression analysis was used to test the association of each of the selected GERDQ questions to pathologic GERD status. Statistically significant predictors of pathologic GERD from univariate analyses were entered into a multivariate logistic regression model using stepwise selection with p value cutoffs of 0.20 and 0.15 to enter and remain in the model, respectively.

Using the estimated coefficients (β) to estimate relative weights of each predictor in the multivariate model, a risk score was created of the GERDQ questions in the multivariate model. Receiver operating characteristic (ROC) curves were generated for the risk score (Fig. 1). Sensitivity (SS) as well as specificity (SP) at different risk score cut points were considered, with the final cut points chosen to maximize potential clinical utility.

Figure 1
figure 1

Receiver operator curves (ROC) for risk score, showing results for percent total time pH < 4 at DT and DS separately.

“Potential clinical utility” for the risk score was considered likely if any of the following conditions were met: (1) C-statistic from ROC ≥ 0.85 or (2) having an optimal risk score cut point (which is the cut point which maximizes SS and SP together) such that both SP and SS are over 80%. We planned to pursue validation of the risk scores model using an independent set of samples only if “potential clinical utility” was met.

Results

Demographics

Between February 2003 and February 2008, of consecutive patients referred for esophageal motility and pH monitoring, 374 met the inclusion criteria. Of these, 336 agreed to participate (90% annual recruitment rate) and completed both GERDQ and pH testing. Of patients completing GERDQ, 203 (60.4%) were females. The median (range) age was 49.8 (18–85) years (Table 1). Pathologic GERD was diagnosed in 49.4% based on the objective measurement of DT or DS, or both, on esophageal 24-h pH testing.

Table 1 Demographic and Esophageal pH Study Data of Patients with GERD Referred for 24-H pH Monitoring

Symptoms

In this patient population, 48% had episodes of heartburn, and 41% had episodes of acid regurgitation for more than 5 years. Over half of all patients (51%) reported “severe or very severe” heartburn, but only 50% and 54% of these patients had DT > 4% or DS ≥ 14.7, respectively. Similarly, 48% complained of severe or very severe acid regurgitation, but only 52% and 56% of these patients had DT > 4% or DS ≥ 14.7, respectively. Over 80% of patients reported being awakened at night from heartburn, and 75% reported nocturnal regurgitation. Daily heartburn occurred in 39% of patients within the previous year, while 47% described heartburn occurring at least once a week in the past year. Symptom improvement with antacids was reported by 58%. Heartburn or acid regurgitation affected daily activities some of the time in 35% of patients and in most or all of the time in 21%. The 227 patients who had at least one episode of heartburn or acid reflux on 24-h pH testing had a median of seven (interquartile range 14) episodes of documented acid reflux based on DT or DS, but only 50% of GERD symptom episodes were correlated with DT > 4% of DS ≥ 14.7% (interquartile range 84%).

One in four patients had presented to their doctor’s offices six times or more in the previous year for GERD symptoms. Ninety percent had already received some sort of diagnostic test by their physician prior to referral for pH testing. Ever-smokers accounted for only 45% of the patients, while fewer than 38% of this patient sample drank alcohol in the last year; however, almost 70% were coffee drinkers.

DT, DS, and Pathologic GERD

The median and interquartile range for DT and DS are presented, for the overall sample and separately by pathologic GERD status, in Table 1. As expected, DT and DS values were strongly correlated (Pearson correlation coefficient = 0.98; p < 0.0001). Because of this high correlation, we used either high DT or high DS to define pathologic GERD in our initial demographic comparisons. When comparing pathologic GERD to no-GERD individuals, males had a significantly higher prevalence of pathologic GERD (p = 0.01, chi-square test), as did older individuals (p = 0.008, t test), when using this combined DT/DS definition of pathologic GERD (Table 1).

Univariate and Multivariate Models

Although we utilized a combined definition of DT/DS for our demographic variables to allow convenient reporting in Table 1, we performed our primary analysis separately for DT and DS, given that each is considered a standard in its own right. Univariate logistic regression analysis identified six out of 22 GERD-related questions with the greatest statistical significance for either DT or DS. These questions, identified as (Q1) through (Q6), were: (Q1) When did heartburn first begin? (Q2) Has heartburn awakened you at night? (Q3) Have you had acid regurgitation in the past year? (Q4) Has acid regurgitation awakened you at night? (Q5) Have you ever had hiatus hernia? and (Q6) Have you ever had disease of esophagus or stomach? In addition to these GERDQ questions, being male (Q7: What gender are you?) and older age (Q8: How old are you at the time of your pH testing?) was also associated with abnormal DT or DS (Table 2).

Table 2 Summary of Univariate Analysis—Questions in GERDQ Most Associated with Pathologic GERD (as Defined by D or DS)

In multivariate logistic regression analysis, (Q1) through (Q8) were assessed using stepwise selection. Five predictors remained statistically significant or near significant after stepwise selection, (Q1), (Q2), (Q5), (Q7), and (Q8), and these data are presented in Table 3. For (Q1), the original Likert categories were partially collapsed based on the results of the univariate analysis and the frequency of each category (adjacent categories with few individuals were automatically collapsed together). For (Q8), the log-odds of the risk function for age approximated linearity; thus, dividing the sample into tertiles was chosen for convenience in developing the risk score model.

Table 3 Multivariate Analysis: Final Models

At least one question was not answered by 98 patients (29%), and therefore these patients could not be included in the multivariate analysis. However, using chi-square analysis, there was no difference between the group with missing questions and the group with complete answers in terms of the frequency of pathological GERD either by DT, DS, or both. Also, answers to the eight questions were similar between the two groups at least for the questions that were answered (data not shown).

Risk Score Creation and Assessment of Risk Score Characteristics

Risk scores were weighted, with the weighting based on estimated β values of each question in the multivariate models (Table 4). The risk score developed for DS and DT using these criteria had a range from 0 through 9, but the weighting was slightly different for DS and DT (Table 4).

Table 4 Risk Score Index for the Significant GERDQ Questions from Multivariate Model

ROC curves were generated and yielded C-statistics for DT and DS of 0.75 and 0.73, respectively. Thus, this risk score fails criterion (1) for “potential clinical utility.”

The optimal risk score cut point for DT was ≥5, and the optimal cut point for DS was ≥6. At these cut points, SS were 68% and 82% and SP were 72% and 60% for DT and DS, respectively. Thus, this risk score does not meet criterion (2) for “potential clinical utility” either (see Table 5).

Finally, when considering other risk score cut points, values for sensitivity and specificity for all possible cut points were well below 90%, except for the extreme risk score cut point of ≤2, which had SS values of 97–100% (range is reported as DS and DT analyses were performed separately) and negative predictive values of 88–100% in this population. As expected, SP values were very low for this cut point (below 50%). Furthermore, the percentage of individuals (in this population of referred patients for pH testing) that had risk score values ≤2 was only 10–15% (see Table 6).

Discussion

Despite a number of GERD questionnaires that were designed to provide reliable assessments of GERD symptoms, comparisons between questionnaires have been difficult, and interpretation of results have varied greatly.1318 Intuitively, clinicians have often assumed that GERD symptoms reported in a questionnaire would accurately represent pathologic GERD. If there is little or no association of GERD symptoms and objective esophageal 24-h pH measurements, misdiagnosis and inappropriate treatment for GERD may result. To our knowledge, our study is the first attempt to develop correlation between subjective self-reported GERD symptoms using GERDQ and objective quantification of key parameters used in 24-h pH testing.

Previous studies have reported that even if reflux symptoms are eliminated completely, it might not ensure normalization of esophageal pH reading.19 Nor does persistence of GERD symptoms imply pathological GERD. In our study, we identified five key questions that were highly associated with the presence of pathologic GERD in this highly selected patient population who were referred for investigation of reflux symptoms. These questions asked about: (1) the duration since the start of heartburn (at least 2–5 years ago but especially if more than 5 years ago); (2) nocturnal symptoms of heartburn; (3) previous diagnosis of hiatus hernia; (4) being male; and (5) being older, all of which were shown to be statistically significant in the multivariate model. Yet with the rare exception of extremely low risk score values, the resultant risk score derived from these questions was not at all useful in discriminating pathologic GERD from those with false-positive symptoms. Furthermore, despite self-reported severe symptoms, only approximately half the patients actually had pathologic GERD based on objective testing.

Table 5 Number of Patients with Risk Score Results by Pathologic GERD, as Defined by DT and DS, Separately Presented
Table 6 Risk Score by Pathologic GERD, Using a Cut Point of ≤2, for DT and DS Separately

Similar to our study, Klauser et al.4 found that GERD-related symptoms, particularly heartburn and acid regurgitation, were highly associated with pathologic GERD but were not particularly discriminatory for predicting pathologic GERD. Similarly, Schlesinger et al.20 reported that 24-h pH monitoring was normal in half of the individuals with reflux symptoms and in 29% with erosive esophagitis. By all of our prespecified criteria (see “Statistical Analysis” section), our risk scores fell short of potential clinical utility for predicting pathologic GERD.

Various subjective diagnostic tools for GERD have been compared to objective 24-h pH monitoring. Klauser et al.4 compared personal interview by gastroenterologists to pH testing, where there was some correlation between an experienced gastroentrologist’s subjective assessment and pathologic GERD. Ghoshal et al.21 compared another standardized questionnaire (Carlsson–Dent), esophageal biopsy findings, and treatment responses with omeprazole with pH testing and found some correlation between severity of symptoms and severity of pH findings. Our study focused on comparing a self-reported questionnaire of GERD symptoms, GERDQ, with pH testing results and found a general lack of clinical utility from GERDQ to predict pathologic GERD.

Nocturnal reflux symptoms are often considered one of the key symptoms of GERD, and this association was confirmed in our study. Weigt et al.22 found that individuals with more typical symptoms of heartburn and regurgitation were associated with greater nocturnal esophageal acid breakthrough on pH testing in patients who were already on proton pump inhibitors. Another study reported low specificity (65%) of nocturnal heartburn and greater specificity using nocturnal acid regurgitation (88%) and cough at night (100%), and the study also reported low sensitivity with each of these symptoms.23 It is likely that both nocturnal acid regurgitation and heartburn are associated with pathologic GERD, and both questions were strongly associated with pathologic GERD in our univariate analysis. However, the tight correlation between these two variables likely led to one of them dropping out of the multivariate model.

Our study found that patients with hiatus hernia were strongly associated with both abnormal DT and DS. This result corresponds to two studies. DeMeester et al.24 reported that acid reflux episodes were found in greater proportion of patients who had a diagnosis of hiatus hernia compared with those without such a diagnosis (83% and 43%, respectively). Jenkinson et al.23 reported that hiatus hernia alone could detect abnormal nocturnal acid reflux with 79% sensitivity and 76% specificity; furthermore, when hiatus hernia and nocturnal reflux symptoms (heartburn, acid regurgitation) were present together, specificity increased to 100%. Together, these data are all consistent with our present results.

We confirmed that men are significantly more likely to have pathologic GERD than women in findings previously reported.2527 Lin et al.28 presented complementary data whereby, in men and women who had similar pH testing results, women reported greater severity of GERD symptoms (heartburn, acid regurgitation, nocturnal symptoms) than men. Richter and DeMeester29 theorized that a greater parietal cell mass in men leads to greater acid secretion in men but does not explain the differences in symptom perception. Lin et al.28 suggested that higher symptom perception and lower pain threshold in women might account for some of these differences. In addition to gender, older men were found to experience longer episodes of reflux than either younger individuals of either gender in one study.26 We confirmed the independent association between increasing age and higher rates of pathologic GERD but did not find an age–gender interaction described in this other study.26

Self-reported GERD questionnaires can be useful. Andersen et al.30 found GERD-related questions to have high sensitivity. Symptom indicators successfully identified almost two thirds of patients with symptoms such as nocturnal heartburn, chest pain, and dysphagia. However, this study compared individuals having benign esophageal disease with individuals having angina pectoris, gastric and duodenal ulcers, or “normal” healthy populations which were vastly different from our underlying patient population. In addition, Shimoyama et al.31 also evaluated nine questions from a 50-item questionnaire with a high sensitivity of 80% (compared to the original 50-item questionnaire); this study did not employ pH testing. While endoscopy may be useful to exclude non-GERD cases, there was also a wide variation to accurately diagnose pathologic GERD using the surrogate endoscopic marker of “mucosal breaks,” and this variation depended greatly on endoscopists’ experience.32

There are several limitations of this study. First, patients were all referred by their physicians for the esophageal motility and pH testing either because of poor response to drug therapy, referring physician’s suspicions that the symptoms were not related to GERD, or prior to consideration of antireflux surgery. This would lead to potential selection bias towards both extremes: overrepresentation of severe pathologic GERD cases and overrepresentation of atypical GERD-symptom patients without pathologic GERD. However, under these circumstances, one would have expected a higher chance of identifying a clinical subset of questions that could discriminate pathologic GERD from no GERD, which was not what we found. Our results are further confounded by the fact that physician referral is typically based on the physician’s assessment of a patient’s GERD symptoms, and agreement between physicians’ and patients’ perceptions of GERD symptoms is often poor.33 Secondly, we assessed only one questionnaire, GERDQ. Although this is a validated questionnaire in other settings, it is possible that other questionnaires could be more discriminatory for pathologic GERD in the setting of referral for pH testing. Despite these concerns, we chose GERDQ because it has been validated and assesses multiple dimensions of the most specific symptoms of GERD, heartburn and acid regurgitation. Thirdly, we assessed a very specific patient subgroup referred for pH testing as a result of our initial hypothesis. As shown in our results, our patients had a high prevalence of symptoms with specificity for pathologic GERD, including nocturnal symptoms, severe and frequent acid regurgitation, and/or heartburn symptoms often of prolonged duration. Thus, the usefulness of GERDQ in other settings, such as use as a general population screening tool or to correlate with impact on activities of daily living, cannot be generalized from this study. Finally, approximately one third of patients had at least one missing information question, which was probed to determine if results based on missing data were statistically different from the complete data. The potential discrepancy of the data might affect its validity for multivariate analysis. However, as patients with pathologic GERD evident by either DT > 4% and/or DS ≥ 14.7 were compared, none of these outcomes was statistically different from the missing and nonmissing groups. When the eight questions were first studied in the univariate analysis, the difference between the missing and nonmissing groups was found to be statistically significant only in the question, “Have you ever had acid regurgitation last year?” (p = 0.01). Yet, this question was later discarded, and the overall difference between missing and nonmissing groups remained negligible for the multivariate analysis. It showed that these predictor questions carried equally valid beta coefficients for risk score development regardless of whether complete or missing information were used.

Conclusion

Using a self-administered standardized and validated questionnaire, GERDQ, our study found that abnormal 24-h esophageal pH monitoring was associated with the following characteristics: prolonged history of GERD-like symptoms, nocturnal heartburn, history of a hiatus hernia, and male gender. Despite statistically significant associations, these questions lacked clinical utility to predict pathologic GERD in patients referred for pH testing. Furthermore, pathological GERD as determined by 24-h pH testing was present only in approximately half of the patients despite severe self-reported symptoms.

The clinical implications of this study are significant in as much as patients with GERD symptoms are frequently treated with proton pump inhibitors or other acid-suppressing medications without objective evidence of pathological GERD. Our study demonstrates that 51% of patients with severe GERD symptoms do not have true pathological GERD on objective testing, and treatment with acid-suppressing medication would be inappropriate. Similarly, patients who have had antireflux surgery who subsequently complain of GERD symptoms should have objective testing before prescribing acid-suppressing medications since symptoms do not correlate with actual acid reflux.