Introduction

Proton-pump inhibitors (PPIs) are the most effective drugs in current treatment of acid-related gastrointestinal disorders such as gastroesophageal reflux disease (GERD) and gastric or duodenal ulcer with and without Helicobacter pylori infections. At present, there are five different substances on the market that have been extensively studied in numerous comparative clinical trials [13]. Data from meta-analyses indicate little difference in cure rates of acid-related diseases (i.e., GERD) at the approved doses of PPIs. Also H. pylori eradication rates did not differ very much among different PPIs, indicating similar efficacy with different doses reflecting differences in potency [4, 5].

For the majority of the population, the currently used PPI doses may be in the flat plateau part of the dose response curve, and therefore from comparative trials with clinical endpoints, no precise estimates on relative potency of the PPIs can be deduced. To our knowledge, no generally accepted tables or guidelines about equipotent PPI dosages have been developed. In clinical practice, however, knowledge of the dose needed to obtain a certain effect in a patient is very important. Such information would also provide the rationale for dose selection when replacing one PPI by another, which is often desired as hospital formularies may be limited to a single PPI. In addition, in spite of the generally high efficacy, there are numerous poor responders to PPI treatment, and in such patients, one might wish to switch between different substances and would like to know the relative potencies more precisely. Finally, it may be relevant to know what PPI potency can be obtained at what price. Due to the high prevalence of acid-related disorders this question is of major pharmacoeconomic impact.

Pharmacodynamic effects of PPIs can be assessed in different ways. One common parameter is the continous measurement of the intragastric pH over 24 h. In patients with GERD, especially patients not responding to usual therapies, esophageal and gastric pH monitoring is a useful technique to assess compliance, pH control, and to investigate the association of reflux with therapy-refractory symptoms [6]. Intragastric pH monitoring in healthy individuals allows direct assessment of acid suppression achieved with an agent and is useful for head-to-head comparisons of antisecretory therapies, but it also may be helpful to guide clinicians in dose titration and in evaluating the effect when switching agents. In clinical trials, acid suppression is typically assessed and summarized as the mean pH over a defined time interval or as the percentage of time during which the pH is above 4 (used as surrogate parameter for the healing of GERD) or above 3 (used as surrogate parameter for the healing of peptic ulcer) [79].

When evaluating variability in individual gastric pH response, factors such as dose timing and food effect, as well as pharmacogenetic factors play a role [10, 11]. Differences in intragastric pH have been described between Helicobacter pylori (Hp)-positive and Hp-negative individuals, with higher mean pH values and better efficacy of PPIs in Hp-positive individuals [1214]. There also might be differences in gastric pH values among patients with GERD since gastric acid production is significantly higher in GERD patients than in non-GERD patients due to differences in nocturnal acid production [15]. Finally, the CYP2C19 genotype has been shown to have substantial effects on PPI exposure with higher plasma concentrations and better acid suppression in poor metabolizers [16]. This, however, is epidemiologically more relevant in Asian countries where the frequency of the CYP2C19 poor metabolizer genotype is about 20% compared to only about 3% in Caucasians [17]. Optimally, the individual CYP2C19 genotype should be available for the comparison, but most studies did not provide this data.

For all the reasons given above, a comparative analysis has to consider the underlying disease, and potency of PPIs can only be compared within a given group (healthy volunteers, ulcer patients, GERD patients) and within one ethnic group (Africans, Asians, or Caucasians). In this systematic analysis, we evaluated all available clinical studies measuring effects on gastric pH in relation to drug and dose in order to generate information on clinically comparative dosages. Specifically, we derived equivalent dosages of PPIs from the mean 24-h pH and from the percent of time with pH > 4, taking the diagnosis of study participants into account.

Methods

Data search and selection

A systematic search in Medline (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi) was performed using the international nonproprietary names of PPIs (esomeprazole, lansoprazole, omeprazole, pantoprazole, rabeprazole) in combination with key terms “intragastric,” “gastric,” “pH,” “acid suppression,” “mean pH,” “intragastric acidity,” “acid secretion,” “24-hour.” In addition, we manually searched the bibliographies of key original or review articles for references not captured by the systematic keyword search strategy. Since we found few data beyond that, we decided not to include articles from non-peer-reviewed journals, data published only on the manufacturer’s websites or given in the drug labeling, conference abstracts, or studies available only on the webpage of the U.S. Food and Drug Administration (FDA) (http://www.fda.gov/cder/approval/index.htm).

Two independent reviewers extracted the information from the relevant articles into a database. Data were taken from the text, tables, and figures. All clinical studies published through January 2007 in patients with GERD or in healthy volunteers with intragastric 24-h pH monitoring were included. We restricted the patient studies to those with GERD because very few studies with 24-h pH assessment were available for other diseases such as peptic ulcer or Zollinger-Elison syndrome. Studies performed exclusively in Hp-positive healthy volunteers or patients were analyzed separately [12, 13]. Finally, included studies were limited to those performed in Caucasians since there are substantial interethnic differences influencing the relationship between dose and effects on intragastric pH between Caucasian and Asian populations such as frequency of the CYP2C19 metabolic genotype, which is relevant for all PPIs, other genetic differences, mean body mass index, and epidemiology of diseases [17, 18]. Thus, data obtained in Asian populations must be analyzed and published separately and may even result in a different ranking of the relative potencies of the PPIs compared to Caucasian populations due to a differential impact of the CYP2C19 genotype for the different PPIs.

Finally, we only included data obtained with standard solid oral formulations (tablets or capsules). All doses provided in this review refer to the dosing information as given by the manufacturer. Since rabeprazole doses refer to rabeprazole sodium, which contains 94.2% pure rabeprazole as the active pharmaceutical ingredient (API), whereas all other PPIs doses refer to the pure base equivalent to 100% API, the true rabeprazole dose is 94% of the dose given in the drug description.

The aim was to gather data on as many drug dosages as possible. Therefore, relatively broad inclusion criteria were chosen. Since only the dose-effect relationship was analyzed, we did not restrict the search to randomized controlled trials only. In particular, phase I trials on intragastric pH in healthy volunteers were conducted without a comparison group and were included. If trials contained different study arms, the arms relevant for our analysis were included, if possible.

Data analysis

The following parameters were taken from each study when available: identity and dosage of PPI, sample size, mean 24-h intragastric pH, and percentage of time with pH > 4 in 24 h. When available, mean values for the effect parameters and the respective parametric measurements of variability (standard deviation, standard error of the mean, or confidence interval) were extracted. Standard errors of the mean and confidence intervals were transformed to standard deviation for subsequent calculations. In some studies, only median and nonparametric variability parameters such as range or interquartile range were given. In this case, we included the median instead of the mean but we did not include the nonparametric variability parameters.

For studies with the same PPI and dosage, overall means weighted by sample size were calculated for the effect parameters as described in Eq. 1:

$$\bar X\; = \;\frac{{\sum\limits_{j = 1}^M {n_{j{\kern 1pt} } \;\bar x_j } }}{N}$$
(1)

with \(\bar X\) being the overall mean, M the number of studies, N the overall number of subjects, n j the number of subjects in the individual study and \(\bar x_j \) the means of the individual studies. The overall variance S 2 was calculated from the individual study means \(\ifmmode\expandafter\bar\else\expandafter\=\fi{x}_{j} {\text{,}}\) the individual sample size per study n, and the individual standard deviations of the studies (Eq. 2):

$$S^2 = \frac{{\left( {\sum\limits_{j = 1}^M {\sum\limits_{i = 1}^n {s^2 \cdot {\kern 1pt} } {\kern 1pt} \left( {n - 1} \right)\,\, + \,\,n{\kern 1pt} {\kern 1pt} \cdot {\kern 1pt} \bar x^2 } } \right) - N \cdot \bar X^2 }}{{N - 1}}$$
(2)

with i representing the individual subjects in the studies, j representing the individual studies, M the number of studies, N the sum of the sample size of all studies and \(\bar X\) the overall mean calculated as weighted mean as described above. Overall standard deviation was calculated as the square root of that variance.

Studies on intragastric pH after a single dose of a PPI were analyzed separately from studies in which pH measurement was performed under steady-state conditions after several days of administration of the same dose. Data for patients with GERD were analyzed separately from data in healthy volunteers. Studies that did not test for Hp status, studies with the average percentage of Hp-positive individuals in the population, and studies excluding Hp-positive individuals were grouped together for analysis. In contrast, studies including only Hp-positive individuals were analyzed separately.

Estimation of equivalence dosages

All available data concerning 24-h pH (125 different groups with sample sizes between 4 and 65 with a total of 2,738 patients) were analyzed in one integrated model using NONMEM V version 1.1. The individual studies were weighted according to their sample size. The 24-h pH obtained during steady-state treatment with the respective PPI was analyzed in dependence of dose according to an Emax model with baseline pH (pHbasal) as follows:

$${\text{pH = pH}}_{{{\text{basal}}}} {\text{ + }}\frac{{{\text{E}}_{{{\text{max}}}} {\text{ $ \times $ dose}}}}{{{\text{ED}}_{{{\text{50}}}} {\text{ + dose}}}}$$

or to calculate the doses required to achieve specific 24-h pH values:

$${\text{dose = }}\frac{{{\text{ED}}_{{{\text{50}}}} }}{{\frac{{{\text{E}}_{{{\text{max}}}} }}{{{\text{pH}} - {\text{pH}}_{{{\text{basal}}}} }} - {\text{1}}}}$$

Emax is the maximum achievable increase in gastric pH above the baseline value measured prior to treatment. ED50 is the dose required to obtain half-maximum increase in pH and is modeled with a variance parameter reflecting inter-disease variability (IIV) in an exponential error model. By including this parameter (IIV) for interindividual variation, differences among the three health-condition groups (healthy volunteers, Hp-infected patients, GERD patients) were described. Differences in potency of the PPIs were included in the final model as a factor (K) with lansoprazole as the reference since the most detailed data were available for this PPI. Therefore, K was arbitrarily set to unity for this drug. Thus, with TVED50 as the overall population estimate of the dose to achieve maximum increase in pH, in the final model it was ED50 = (TVED50 ± IIV) × K.

The potency factors were calculated by dividing the ED50 of omeprazole by the ED50 of each PPI. They were calculated for the parameters “mean 24-h pH” and “percentage of time pH > 4.” The absolute equivalence doses were calculated compared to omeprazole 20 mg dose by dividing by the potency factor.

Statistical testing

Differences among GERD patients, healthy volunteers, and Hp-positive volunteers as well as between single and multiple doses were analyzed using univariate analysis of variance (ANOVA) with study, treatment (single dose/multiple dose), and disease status (GERD or healthy) as factors and either 24-h pH or percent time pH > 4 as dependent variables. Statistical significance of two-group comparisons was assessed by the Student’s t-test.

Results

A total of 304 publications of potentially relevant clinical trials were obtained from which 151 citations were retrieved for further analysis. Trials using intravenous administration of the drugs, studies in patients with diseases potentially influencing gastric pH, and studies that did not perform 24 h of gastric pH measurement were excluded resulting in 57 clinical studies included in the analysis (Fig. 1). Forty-three of the studies were performed in healthy individuals, 12 in GERD patients, and 2 studies were performed exclusively in Hp-positive healthy individuals [19, 20]. Testing for Hp was performed in 36 of 57 studies, and individuals positive for Hp were excluded in 24 of the studies. In 10 studies, Hp-positive subjects (between 10 and 20%) were included. In 21 of the studies, Hp status was not tested and not used as an inclusion or exclusion criterion.

Fig. 1
figure 1

Flow diagram on inclusion and exclusion of trials (randomized controlled trials, RCTs). Adapted from the Quorum statement flow diagram [93]

In Table 1, the mean intragastric pH measured over 24 h is listed for different doses of a PPI. Data are presented for single-dose studies and multiple-dose studies (Table 1), and for healthy individuals or GERD patients, separately. The mean pH and the standard deviations are presented weighted according to study sample sizes.

Table 1 Mean intragastric pH ± standard deviation over 24 h after single doses and multiple doses of proton-pump inhibitors (PPIs)

For omeprazole, data on 10, 20, and 40 mg daily doses were found for single- and multiple-dosing conditions. GERD patients had lower mean 24-h pH values than healthy subjects, and multiple dosing increased pH significantly compared to single doses (P < 0.001 for both factors, ANOVA). Even after a relatively small dose of 20 mg, mean gastric pH was significantly higher (3.5 in healthy [2123] and 3.6 in patients [1, 3, 24, 25]) after multiple compared to single dosages (1.8 in healthy [2, 2628]).

For esomeprazole, data on dosages of 20, 40, and 80 mg were identified. Most studies were performed in patients with GERD, and these patients had a trend towards lower mean pH compared to healthy individuals after 40 mg esomeprazole (4.4 vs. 4.8, P = 0.07, based on seven studies [1, 3, 24, 2931, 32]).

For pantoprazole, data on 10, 20, 40, 60, 80, and 120 mg dosages were found. For the 40 mg dose, data on both patients with GERD as well as healthy volunteers were available (n = 195 for healthy individuals and n = 75 for GERD patients). No difference in mean pH was detected between the two groups (mean pH 3.5 vs. 3.6).

Data on lansoprazole were identified with daily doses of 10, 15, 20, 30, 60, 90, 120, and 180 mg. Most data were available for the 30 mg dose (multiple dosing), and significant differences were observed for mean 24-h pH in healthy individuals versus GERD patients (5.1 versus 4.8, P = 0.004 for 30 mg dose, t-test) [1, 3, 21, 23, 3141].

Data on single and multiple dosages of 5, 10, 20, and 40 mg of rabeprazole were available. One study in 18 Hp-positive healthy individuals was included but analyzed separately. Hp-positive individuals had higher mean pH values at the given rabeprazole dosages compared to healthy individuals (P = 0.03 for multiple 20 mg doses). GERD patients had lower mean pH compared to healthy volunteers (mean pH at 20 mg multiple dosages was 3.3 for GERD patients and 4.5 for healthy individuals, P = 0.004).

In addition to the mean pH over time, in many clinical studies the percentage of time with pH > 4 was used as a surrogate parameter for the efficacy of PPIs in treatment of GERD. The data obtained for percentage of time (for 24 h) with pH > 4 are shown in Table 2. Data are given as sample-size-weighted mean and standard deviation in patients and healthy volunteers for single and multiple doses. As can be seen from the single-dose data, one single PPI dose is not enough to obtain a pH > 4 for 50% of the time or longer but with multiple dosing over several days, the aim to increase pH above 4 was achieved for at least 70–80% with higher dosages. There were no significant differences in mean percentage of time with pH > 4 between patients with GERD and healthy individuals for any of the PPIs.

Table 2 Mean percent duration of time with intragastric pH > 4 after single doses and multiple doses of proton-pump inhibitors (PPIs)

In Fig. 2, the reported percentages of time with pH > 4 for each PPI dose are depicted for GERD patients and healthy individuals. The values did not differ very much between GERD patients and healthy individuals, which was in keeping with similar results in several individual studies.

Fig. 2
figure 2

The upper figure (a) shows the percentage of 24-h time at which gastric pH was above 4.0 in relation to the dose of the PPI. Only data from multiple dosing studies are shown. The lower figure (b) shows the mean 24-h gastric pH measurements from all multiple dose studies analyzed in the present review. These data are the basis of the nonlinear regression analysis presented in Fig. 3 and Table 3. Data measured in healthy volunteers are shown as circles, data from patients with GERD are shown as triangles and data from Helicobacter pylori positive healthy volunteers are shown as squares

Population pharmacokinetic analysis of the entire dataset identified no interindividual variation for Emax, the maximum possible increase in pH, and no discernable differences in Emax among the five PPIs. However, Hp-infected patients formed a distinct group with a separate estimator for ED50. The estimated ED50 values and the estimated doses required to achieve a mean pH of 4 or less for each PPI are summarized in Table 3. Estimations were only done for multiple-dose conditions, since single doses did not allow the therapeutic goals to be met.

Table 3 Parameters reflecting proton-pump-inhibitor effects on 24-h gastric pH according to an integrated population pharmacokinetic model

The estimated relationships between dose and mean 24-h pH are depicted in Fig. 3 for each PPI and separately for healthy volunteers, GERD patients, and Hp patients to the extent that the respective dose has been studied.

Fig. 3
figure 3

Predicted mean 24-h pH values in relation to the dose of the PPIs. Only predictions for those dose ranges are shown which have been studied as given in Tables 1 and 2. The curves depicted are for healthy volunteers (circles), patients with GERD (triangles) and H.p. positive healthy volunteers (squares). The analysis yielding these mean estimates is described in the “Methods” section

Based on the mean 24-h gastric pH, the relative potencies of the five PPIs compared to omeprazole were 0.23, 0.90, 1.00, 1.60, and 1.82 for pantoprazole, lansoprazole, omeprazole, esomeprazole, and rabeprazole, respectively. Compared with healthy volunteers, patients with GERD needed a 1.9-fold higher dose and Hp-positive individuals needed only 15% of the dose to achieve a given increase in mean 24-h intragastric pH (Table 3). There were not sufficient data to find significant differences among the three subgroups (healthy, healthy with Hp, and patients with GERD) concerning the basal pH values prior to PPI treatment and the maximum achievable pH (Emax).

Discussion

The present analysis provides estimates of the relative potencies of PPIs based on peer-reviewed published data on gastric pH effects. This information provides a rationale for mutual replacement of PPIs in clinical practice and for scientific comparisons of studies with different PPIs. Intragastric pH measurements, particularly expressed as percent time with pH > 4 (Table 3), are established biomarkers of the therapeutic efficacy of PPIs [30]. Maintenance of pH > 4 is an important objective in management of GERD: when the pH of the acid reflux rate falls below 4, patients may experience mucosal injury in the esophagus [42, 43].

Confounding factors and other limitations

There are several confounding variables leading to variability among the individuals within the studies and to variability among the studies themselves, and probably not everything but only the most important factors should be considered in our meta-analysis. Generally, for our analyses we would have wished that for each of the five PPIs, the dose-response relationship would have been tested in the dose range between 5 and 200 mg under the same clinical study conditions in subjects well characterized for Hp status and disease status. This was apparently not the case and this of course limits the reliablity of the results of our meta-analysis. Nevertheless, our comprehensive analysis may put a new light on the relative potencies of the PPIs, which cannot be obtained by in vitro measurements (since these would, for instance, not reflect differences in pharmacokinetics) and which cannot be obtained from studies focusing on clinical endpoints, since these studies are less precise and have not been performed over a sufficient range of dosages.

Because of different basic pH and different responses to PPIs, we stratified the data into studies in healthy individuals, GERD patients, and studies exclusively performed in Hp-positive individuals. We did not mix up studies performed in Asian individuals with Caucasian individuals because of the differences in body mass index and differences in metabolizing activity due to genetics. In Asian populations, about 20% are poor metabolizers of drugs metabolized by CYP2C19 including all PPIs [44].

The activity and the genotype of the cytochrome P450 enzyme CYP2C19, which determines to a large extent the pharmacokinetics of all five tested PPIs, was not directly considered in this analysis because in the majority of studies in Caucasians either the CYP2C19 genotype was not analyzed or at least was not published. While there are a lot of data showing evidence that the CYP2C19 genotype has a tremendous impact on pharmacokinetics of PPIs, only a few studies in Caucasians assessed the influence of the CYP2C19 genotype on mean 24-h intragastric pH. Thus, the CYP2C19 genotype could not be integrated into our model. However, since the CYP2C19 genotype influences drug metabolism, differences between the metabolizer groups on mean pH can be estimated from the differences in drug exposure. In Table 4, the influence of the CYP2C19 genotype is expressed as a dose factor that can be derived from pharmacokinetic data on clearance or AUC (data taken from [10, 11, 4550]) and transformed by using the methods described earlier [51, 52]. Thus, the effects of genotype on mean pH can be extrapolated when calculating the genotype-specific dose by applying the dose correction factor to the dose given to the patients.

Table 4 Dose correction factors for CYP2C19 poor and intermediate metabolizers. Data are from [10, 11, 4550, 91, 92]

One confounding factor that could not be optimally considered was Hp status. In about half of the studies included, Hp was not tested, and in some studies, a small fraction of Hp-positive individuals was included. Since the percentage of Hp-positive individuals in the normal population is about 10–20% [53], and because in those studies including Hp-positive individuals, this fraction was typically 16–20%, we decided to include such studies since this fraction of potentially Hp-positive individuals would not drastically alter the mean pH of the whole group. In addition, it also reflects the clinical situation in which GERD patients treated with PPIs are usually not tested for Hp.

We further neglected the time of dosing during the day and the dose splitting, and only referred to the daily dosages. Of course, dose splitting has an effect on pharmacokinetics and also intragastric pH, but since in clinical practice, single daily doses are much more practicable and often the exact time of drug intake is not known, we did not account for the influences of dose splitting during the day.

Dose equivalents according to the literature

Different national and international guidelines for dosing of PPIs exist. The WHO Collaborating Centre for Drug Statistics Methodology (http://www.whocc.no/atcddd/) proposes doses of 20 mg omeprazole, 30 mg esomeprazole, 30 mg lansoprazole, 40 mg pantoprazole, and 20 mg rabeprazole to be equivalent for the treatment of GERD. From this recommendation, it is not at all evident why a higher dose of esomeprazole is recommended compared to that of racemic omeprazole. According to Table 3 and other analyses [54], it is well documented that 30 mg omeprazole would rather correspond to 20 mg esomeprazole.

The Canadian Association of Gastroenterology recommends 20 mg omeprazole to be equivalent to 40 mg esomeprazole, 30 mg lansoprazole, 40 mg pantoprazole, and 20 mg rabeprazole for acute treatment of GERD [55]. Again, according to the clinical data, esomeprazole does not seem to be less potent than omeprazole [54], thus, the basis of the recommendation of twice the esomeprazole dose relative to omeprazole may be based on studies showing a particular value of 40 mg esomeprazole dose [24], but apparently not on comparison of pharmacologically defined potencies or efficacies.

The FDA provides dosages for the PPIs for approved indications and for treatment of GERD (http://www.pbm.va.gov/reviews/ppiabbreviatedreview.pdf). Different dosages are given for symptom relief, to maintain symptom control of nonerosive GERD, and to heal or maintain symptom relief of erosive or ulcerative esophagitis. The omeprazole equivalents for symptom relief of GERD are given by the FDA as 1 for esomeprazole, 0.75 for lansoprazole, 2 for pantoprazole, and 1 for rabeprazole. The comparative doses of PPIs according to the FDA were based on relative efficacies based on subjective or objective measures of response to treatment reported in double-blind, randomized controlled trials or systematic reviews in patients with gastrointestinal acid-related disorders.

One meta-analysis of Caro et al. assessed the endoscopic healing of erosive esophagitis [56]. The dosages for obtaining similar 8-week overall healing rates were omeprazole 20 mg/day, lansoprazole 30 mg/day, rabeprazole 20 mg/day, and pantoprazole 40 mg/day. These estimates are in a similar range to our findings although the dosage of omeprazole would be slightly higher according to our data.

Another review comparing pharmacokinetics, acid suppression, and efficacy of PPIs concluded from comparative studies of acid suppression that lansoprazole and pantoprazole have a potency similar to that of omeprazole, whereas rabeprazole had a greater potency than omeprazole on a per-milligram basis [57]. However, this was not a systematic analysis of all available data on acid suppression at different PPI doses, and therefore could only provide a rough estimate of dose equivalents.

In summary, there are several guidelines for the different PPIs with considerable differences in the dosage, leading to a confusing picture concerning comparative efficacy of PPIs. In none of the compilations are dose recommendations comprehensive, and often individual selected studies appeared to be the basis of the recommendations. This lack of true comparative efficacy data may be in part due to the comparison of the limited number of fixed doses, which were not equivalent. In this context, using relevant data published in an accessible way and under peer-reviewed conditions, the present analysis attempted to derive dose equivalents based on the effect on gastric pH, which is known to be representative of the clinical efficacy. Of course, our meta-analysis based on surrogate markers is not intended to change daily clinical practice in a typical patient. The equivalence doses assessed provide a rationale for the appropriate dose selection for a patient when switching to a different PPI and for calculating the cost-effectiveness of individual PPIs.