Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Bronchodilators are drugs that dilate the bronchi and bronchioles, decreasing resistance in the respiratory airway and increasing airflow to the lungs. Typically, bronchodilators are used to treat obstructive lung diseases of which asthma and chronic obstructive pulmonary disease (COPD) are the most common. Bronchodilators are either short-acting or long-acting. Short-acting medications provide quick or ­“rescue” relief from acute bronchoconstriction. Long-acting bronchodilators help control and prevent symptoms. The main types of prescription bronchodilating drugs are β2-agonists (short- and long-acting) and anticholinergics (short- and long-acting). These drugs are administered directly into the lungs by inhalation. The β2-agonists are used to treat both asthma (in combination with anti-inflammatory medication) and COPD; the anticholinergics are used in COPD. While the short-acting drugs are used when needed, to provide fast temporary relief from symptoms and flare-ups for periods of 4–6 h, the long-acting drugs are taken at fixed intervals to control and prevent bronchoconstriction. Typically, the long-acting medication takes longer to begin working and provides relief for up to 12 h. Indacaterol is a new long-acting β2-agonist which provides both a rapid onset of action and 24 h control (EMA 2009a, b) in the maintenance treatment of airflow obstruction in patients with COPD.

Spirometry is a pulmonary function test used to diagnose respiratory disorders and assess the efficacy of bronchodilators. It measures the volume and velocity of air that can be inhaled or exhaled. These quantities are captured in the metrics forced vital capacity (FVC) and forced expiratory volume at 1 second (FEV1). Figure 11.1 shows the spirometric time course used to derive these key metrics. Results are usually given in both raw data (liters, liters per second) and percent predicted—the test result as a percent of the “predicted values” for the patients of similar characteristics (height, age, sex, and sometimes race and weight). The ratio of FEV1/FVC is a measure of airflow limitation. A post-bronchodilator FEV1 <80% of the predicted value in combination with an FEV1/FVC <70% confirms the presence of airflow obstruction—either COPD or asthma. An improvement of FEV1 of ≥12% (either spontaneously or after inhalation of a bronchodilator and/or following a 2-week course of oral corticosteroids) indicates reversibility, and therefore suggests a diagnosis of asthma.

Fig. 11.1
figure 1

Spirometry measures used to assess lung function

The FEV1 and FVC thresholds used for diagnostic purposes are sufficiently far apart to allow a reasonable chance of correct diagnosis given the inherent variability in these measures. In recent years, FEV1 has also become the metric of choice to support dose selection of new bronchodilators. In contrast to their use as diagnostics, less attention has been paid to the performance of FEV1 as a differential marker of the bronchodilatory response.

This chapter uses graphics to explore the properties of the FEV1 signal as a basis for the assessment of the dose response of a bronchodilator. The following issues are explored:

  • The inherent variability of FEV1 and its consequences as a biomarker of the bronchodilatory response

  • The traditional approach to dose ranging based on pairwise comparisons of active dose and placebo

  • Model-based characterization of the bronchodilatory dose response

Many of the thoughts captured in this chapter evolved from experience gathered during the development of the novel bronchodilator, indacaterol. The graphics presented hereafter were originally used at various stages of the development program and for this reason are not always uniform with respect to appearance or display.

2 FEV1 As a Marker for Dose Response Assessment

FEV1 is a non-invasive, objective, quick and cheap measure of bronchodilator activity. Despite these appealing characteristics, it suffers from an important drawback; it has a low signal to noise ratio (SNR), i.e., the drug-related change in FEV1 is small relative to the background within- and between-patient variability in the measure.

To compare the degree of variability to the magnitude of the dose response relationship of a bronchodilator, Fig. 11.2 presents pooled raw trough FEV1 data (black dots, left plot) from two dose ranging trials in indacaterol. The red line is a smooth curve to capture the average change in response with dose. A reader unfamiliar with these measures might be forgiven for missing any evidence of a dose response. The right plot zooms in on the dose response curve to present it on the scale it is typically presented on to give an impression of the SNR. While the raw trough FEV1 data ranges between about 0.5 and 3 L, the drug-induced changes range between about 1.25 and 1.45 L. In other words, the approximately 0.2 L drug-related signal is located in 3 L of noise.

Fig. 11.2
figure 2

Raw trough FEV1 data vs. dose on original scale (left). Zoom of dose response (right)

An initial conclusion from these figures is that it is very difficult to depict the dose response based on FEV1 for a bronchodilator using even large quantities of raw data alone due to the low SNR. In order to tease out a more useful signal for the purposes of graphical representation of the data, it is necessary to improve the signal by removing sources of variability.

Figure 11.3 presents the estimated SNR in FEV1 based on an analysis of covariance (ANCOVA) of individual steady-state trough FEV1 measurements taken from 5,558 patients and 34,615 observations collected in 9 studies across the indacaterol development program. Patient characteristics, defined by baseline FEV1 and reversibility, account for much of the total variability. Having taken all covariates into account, about 210 mL of variability remains unexplained. This provides the denominator estimate for the SNR estimates in Fig. 11.3. Placing this noise in the context of the indacaterol dose response indicates that the SNR for differentiation of most doses is less than 15%. For example, the signal in the differentiation of the 75 and 150 μg is only about 1/10th of the noise. Even the largest dose of indacaterol, 600 μg, when compared to placebo has a SNR of less than 1. As we shall see below this has consequences for the precision with which active doses can be differentiated in reasonably sized trials.

Fig. 11.3
figure 3

Signal to noise ratio for differences in trough FEV1 between various doses in the indacaterol dose response

Baseline FEV1, i.e., FEV1 measured prior to treatment, accounts for a major part of total variability, and for this reason, it must be accounted for when assessing the response to a bronchodilator. Figures 11.4 and 11.5 show the impact of baseline normalization. Figure 11.4 shows the individual 24 h profiles (black lines) for a placebo and indacaterol dose, respectively, in a COPD population. The response in the indacaterol cohort is hardly discernable from the placebo cohort at the population mean level (red line) due to the large variability. Figure 11.5 shows the impact on these profiles of subtracting the baseline FEV1 value for each patient. Having now accounted for the biggest source of between-patient variability, the drug-induced average increase in FEV1 becomes more apparent.

Fig. 11.4
figure 4

FEV1 profile after a single dose of placebo (left) and indacaterol (right)

Fig. 11.5
figure 5

Baseline corrected FEV1 profile after a single dose of placebo (left) and indacaterol (right)

The next step to increasing the SNR is the correction for systematic deviations due to circadian variability or other extraneous systematic study-related variability. In a review of 4,756 pulmonary function tests from individuals who required one for any reason, Medarov et al. (2008) reported a diurnal variation of 17.6% between the lowest (1.80 L) and highest (2.12 L) mean FEV1 values. The circadian variability represents the natural daily change in the time course of broncho-responsiveness over the course of the day. Typically, the FEV1 nadir is reached shortly before awaking and subsequently approaches a peak by mid morning, from where it descends slowly again to its nadir again over the course of the remaining day. To account for these systematic changes, the baseline corrected curves are adjusted by subtracting the placebo profile. Figure 11.6 presents the impact of placebo correction on the FEV1 profiles on a range of indacaterol doses. The biggest placebo-induced changes occur in the morning where the curve rises rapidly from the nadir to the peak. This is also the time where trough FEV1 is assessed. Any discrepancies in the time of dose relative to the sampling time or in sampling times across patients will inflate variability in the corrected FEV1 measures.

Fig. 11.6
figure 6

Baseline corrected mean FEV1 profiles before (left) and after (right) placebo correction

For the purpose of dose selection in the development of new bronchodilators, the FEV1 response is assessed for a range of doses at steady state, over the course of a dosage interval. Typically, several metrics are considered: peak FEV1 to represent the maximum response expected in a dosage interval; area under the curve of serial FEV1 measurements within a dosage interval to give the average bronchodilation over the course of the day; and trough FEV1 measured just prior to the next administration, to give the minimum degree of bronchodilation.

Figure 11.7 presents the corrected mean trough FEV1 response to 150 μg indacaterol from 6 trials (one in each panel) with between 92 and 420 COPD patients at each study visit (blue dots). This analysis has taken into account the important covariates, such as baseline and reversibility, measured in each trial and accounts for the known major differences in patient characteristics. Despite this, considerable unaccounted within- and between- study variability remains. The median responses (red lines) across the 6 trials span 60 mL. Within trials, the mean trough FEV1 spans a maximum of about 50 mL (gray areas).

Fig. 11.7
figure 7

Steady-state FEV1 response to 150 μg indacaterol across visits in six different studies

Given that baseline accounts for a major portion of difference between patients and is necessary to tease out treatment signals, it is important to understand its behavior over time when interpreting baseline corrected data. Figure 11.8 presents the change in trough FEV1 relative to baseline in 385 COPD patients treated with placebo on study days 1, 8, 15, and 29. Note: all FEV1 values are the mean of two adjacent measures 15 min. apart within each subject. The black lines join the observations in each patient to give an impression of the within-patient variability. The box and whisker plots show the distribution of values on the various study days. Several trends are evident:

Fig. 11.8
figure 8

Fractional change in trough FEV1 relative to baseline in placebo-treated COPD patients over 28 days

  • The FEV1 values within any one patient can vary considerably from day to day in the absence of any drug. This is apparent from the varied trajectories of the black lines connecting the observations on the respective study days

  • Baseline correction is most effective for the trough value one day after commencement of treatment. However, with increasing time, the impact of baseline correction becomes diluted. This is apparent from the increasing width of the boxes over time

  • The dilution of baseline correction is noticeable beyond the first dose and increases continually over the course of the study

This simple exercise demonstrates that the imprecision of baseline corrected values can increase 2- to 3-fold relative to the response observed immediately after the first dose. This effect is mainly driven by unexplained within-patient variability in baseline. In the relatively large sample presented in Fig. 11.8, the confidence interval for the fractional difference from baseline will be over 5% by day 30 which adds about 60 mL uncertainty to the drug-related response. This effect explains the common observation that a dose response relationship, presented as its baseline ­corrected value, is almost always more pronounced following the first dose compared to subsequent doses. Figures 11.10, 11.11, and 11.12 bear out this point: while a clear dose response is observed on Day 2, the greater imprecision on subsequent days confounds dose differentiation on subsequent days.

In summary, this section shows that bronchodilator-induced changes in FEV1 are small relative to the natural between- and within-patient variability. In order to tease out the bronchodilator signal from the raw data, it is necessary to remove the major sources of variability such as baseline and circadian effects. However, even when the known sources of variability have been accounted for, large uncertainty in the underlying true bronchodilatory response remains. In particular baseline correction is most effective for accounting for variability following the first dose. Within-patient variability will dilute its corrective power as the duration of the study increases.

In order to provide a robust assessment of the underlying dose response of a bronchodilator, the analysis methods used will have to have adequate precision to tease out the low drug-related signal while appropriately accounting for the various sources of high variability.

3 Traditional Dose Ranging

As shown above, characterizing the bronchodilatory response requires significant manipulation of raw FEV1 data in order to tease out the drug-related signal. The traditional approach uses ANCOVA to account for the major sources of variability. For example, a typical ANCOVA analysis will use a statistical model to account for covariates such as treatment (dose), baseline FEV1, FEV1 reversibility to one or more bronchodilators, smoking status, country, and center in which the trial was carried out. By accounting for the known differences between patients and measurements, the SNR is improved.

The traditional ANCOVA approach to dose ranging compares active doses and placebo (contrast statistics) to determine the existence of a dose response and, if so, to select a target dose. If the difference between placebo and at least one of the doses is statistically significant, the presence of a dose–response relationship has been confirmed: i.e., there is a positive treatment effect. The target dose is then estimated as the smallest statistically significant dose which has an average effect that is clinically relevant according to a pre-specified value of clinical relevance (Bornkamp et al. 2007). Although the simplicity of this approach is appealing, it has the following issues:

  1. 1.

    It requires the definition of a pre-specified value of clinical relevance. However, the FDA has not determined a minimal clinically important dose (MCID) for FEV1 use in regulatory submissions (Michele 2011)

  2. 2.

    The traditional trial is primarily designed to detect the presence of a dose response, i.e., to detect a significant treatment effect, but not to characterize it. The traditional trial is not designed to differentiate active doses from each other. Hence sample sizes are chosen to allow statistical differentiation from placebo only

  3. 3.

    The low SNR associated with FEV1 in the bronchodilatory response means that the estimated responses at each dose level will lack the precision to differentiate active doses from each other and cannot easily be benchmarked against any target level of response

The third issue is a question of sample size. Therefore, it is instructive to consider the relationship between precision and group size for typical bronchodilator studies as presented in Fig. 11.9. Precision here is defined as the distance from the point estimate to the limit of the 95% CI (i.e., half the length of the interval). The sample size is determined such that the precision is obtained with 95% coverage probability, i.e., in 95% of the studies one would expect to see a precision less than what is reported on the graph.

Fig. 11.9
figure 9

Distance (in mL) from point estimate to limit of 95% CI vs. sample size (per group)

For example, a sample size of 100 patients per group would result in a 95% CI that extends no further than 68 mL (half the interval) from the observed point estimate with 95% chance. While this precision is acceptable for discriminating a particular dose level from placebo, it lacks the precision to reliably construct a graded dose response relationship. To discriminate doses with the precision necessary to reconstruct a graded dose response, much larger sample sizes would be required. For example, Fig. 11.9 shows that 500 patients per group would provide a precision of no more than 30 mL (half the interval) with 95% chance.

The precision with which the response to adjacent doses can be estimated is the measure by which we can reliably reconstruct the graded response over the dose range. The higher the precision, the easier it will be to robustly identify the threshold to the plateau of the dose response relationship. By the same token, the lower the precision, the higher the likelihood that too low a dose will be identified as being significantly different from the plateau of the dose response.

Figure 11.10 is a typical representation of the results of a traditional dose ranging trial in COPD, with about 86 patients per treatment arm. It depicts the baseline corrected longitudinal response after the first dose and last dose for a range of indacaterol doses, a positive control salmeterol and placebo. A figure very similar to this was recently published (Chowdhury et al. 2011) to discuss the dose response relationship of indacaterol. Unfortunately, this graphical presentation that focuses on the point estimates ignores the uncertainty in their estimation. Without this information, it is impossible to draw any statistically meaningful conclusions about possible treatment differences, never mind any discussion of the evolution of treatment effects across study days.

Fig. 11.10
figure 10

FEV1 profile after first (1) and last (14) dose from a traditional dose ranging trial for indacaterol in COPD

A clearer presentation of the summary data is achieved by focusing on a cross-sectional assessment of a particular metric, such as trough FEV1, providing the point estimates together with confidence intervals. Figure 11.11 presents the ANCOVA derived placebo contrasts for trough values from the same study shown in the previous figure, whereby the Day 2 values are the trough observations 24 h after the first dose and the Day 15 values are the corresponding troughs after the 14th (last) dose. The bars with numbers correspond to the tested indacaterol doses, SME is the response to the positive control salmeterol.

Fig. 11.11
figure 11

Trough FEV1 least square mean estimate (placebo contrasts) after first and last dose from a traditional dose ranging trial for indacaterol in approximately 86 COPD patients per treatment group

Treatment-related changes in response by dose can now be interpreted in the context of the uncertainty in the estimates. The confidence intervals for all responses on Day 15 are larger than the confidence intervals on Day 2, and they all overlap. The fluctuations in all indacaterol treatments are of similar magnitude to the fluctuations in the positive control. In other words, the least squares mean estimates at steady state lack the necessary precision to allow any statistically meaningful differentiation between the active doses. On comparing the responses on both study days, it is apparent that the responses of some doses increase and others decrease, and all are within the range of the confidence intervals of approximately ±50 mL.

In the belief that the greater responsiveness of asthma patients to β2-agonists would allow better differentiation of doses, the FDA requested dose exploration in this population (Chowdhury et al. 2011). Figure 11.12 presents the results of a traditional dose ranging trial carried out in asthma patients. Note that this trial is identical in design to the one shown in the previous figure except for the patient population.

Fig. 11.12
figure 12

Trough FEV1 least square mean estimate (placebo contrasts) after first and last dose from a traditional dose ranging trial for indacaterol in 86 asthma patients per treatment group

While the absolute response in asthma patients is greater, as expected, the associated confidence intervals are disproportionately wider. Hence, whatever might be gained in terms of signal is lost in terms of noise. Performing dose ranging in asthma patients confers little or no advantage whatsoever in terms of dose finding. The performance of the positive control (salmeterol) across study days highlights the difficulty in interpreting the data; between Day 2 and Day 15, there is an 80 mL reduction in response which is more than half the maximum response noted for indacaterol in this trial; this is not a statistically significant difference. It is worth noting that given the width of the confidence intervals, it is perfectly possible that the dose of 18.75 μg could have produced a response larger than all the other doses simply by random chance. This underlines the fact that the traditional approach to the assessment of dose response lacks the precision to adequately differentiate doses of a bronchodilator.

As predicted from Fig. 11.8, the confidence intervals are consistently wider on Day 15 compared to Day 2. This is most likely due to the impact of within-patient variability which dilutes the corrective power of the baseline. The poorer precision on Day 15 further confounds the assessment of the response estimates.

The sobering conclusion of this brief graphical presentation is that dose differentiation based on this endpoint using traditional methodologies is not robust. The traditional trials are designed to differentiate active doses from placebo and not active doses from one another. Furthermore, it is not feasible to adequately expand the trial size for the differentiation of a sufficient number of doses necessary to estimate a graded dose response. The inherent variability in FEV1 relative to the treatment difference is of a magnitude that makes precise treatment differentiation nearly impossible using traditional methodology. Replacing COPD patients with asthma patients does not provide a solution.

In summary, these examples demonstrate that the traditional approaches to dose ranging and dose selection of bronchodilators preferred by some regulatory authorities do not provide a rational basis for the differentiation and selection of doses. In the examples presented, any of the doses tested could have produced the numerically largest response given the imprecision of the chosen approach. In the search for differentiation, it is likely that the traditional approach will be biased toward selecting too low a dose for the simple reason that the bigger the difference in the doses, the more likely that differentiation will be apparent.

4 Indacaterol Dose Selection

The indacaterol program, following regulatory precedent and guidance, relied heavily on traditional empirical approaches to selecting doses. The emphasis of the initial approach was to identify the lowest doses that were superior to the comparators formoterol and tiotropium in terms of trough and peak FEV1. Note that the inclusion of peak was encouraged by the FDA. The selection rule aimed to identify the minimum dose from the range of 75, 150, 300, and 600 μg that had a response numerically superior to the comparators. Both the minimum and next highest doses were tested together in a longer extension of the trial.

Note that the highest doses of 300 and 600 μg were independently tested in a year-long trial in order to establish the safety of the highest available doses, irrespective of the ultimate clinical doses.

Figure 11.13 depicts the outcome of the primary dose selection trial. Based on the criterion of numerically exceeding the trough values of the comparators, the doses of 75 and 150 μg would have qualified, however, on the basis of the peak metric, the doses of 150 and 300 μg were selected. These were the doses subsequently approved in most countries around the world.

Fig. 11.13
figure 13

Dose selection trial for indacaterol in COPD

On submission of these data to the FDA, a request was made to better explore efficacy at lower doses. For this purpose, the dose ranging trials depicted in Figs. 11.11 and 11.12 were carried out and were central to the FDA decision for approval of the 75 μg dose (Chowdhury et al. 2011).

In the course of the development program, the shortcomings of the traditional approach to dose selection of bronchodilators started to become more apparent. For this reason, alternative approaches were explored to provide a more rational and robust means of dose selection (Renard et al. 2011), prior to the submission to the FDA. The remainder of this article provides a graphical exploration of this approach.

5 Estimation of the Dose Response Relationship

The traditional approach to dose ranging assesses the response to each dose independently and tests whether each dose is different from placebo. In other words, the traditional approach simply detects whether a dose response is present, but does not explicitly characterize it (Bornkamp et al. 2007). However, if sufficiently many doses have been tested it is not uncommon that a dose response relationship is inferred from such an analysis, even though no such relationship has been or was ever intended to be estimated. Chowdhury et al. (2011), for instance inferred a dose response relationship from a study that was not designed to have adequate precision to support such inference (see Figs. 11.11 and 11.12).

A more rational approach is to use our knowledge of the underlying pharmacology of bronchodilators to determine the most likely shape of the dose response relationship and use a mathematical model to approximate this functional form based on the available data across all dose levels. It is known that direct acting bronchodilators cause increased bronchodilation with increasing dose until a maximum threshold is achieved.

Figure 11.14 depicts the primary analysis results for the 12 major COPD efficacy trials carried out in the indacaterol development program. Each point represents the ANCOVA derived least squares mean contrast trough FEV1 at each dose level assessed across the trials; this figure summarizes data from 8,111 patients studied for up to 6 months. The horizontal line represents the minimum clinically important difference (MCID) as defined by some authors (Cazzola et al. 2008). The vertical lines represent the 95% confidence intervals of each estimate and indicate the large uncertainty in even the largest trials. In contrast to the raw data in Fig. 11.3, a dose response signal is apparent; whereby; with increasing dose, trough FEV1 appears to increase to a plateau.

Fig. 11.14
figure 14

Trough FEV1 placebo contrasts for the primary statistical analysis at each doses level in 12 studies in COPD patients

On fitting a mathematical model to such data, it is possible to provide an explicit estimation of the graded dose response relationship which allows the relative potency of each dose to be calculated. Once an explicit estimate of the dose response is available, it is straightforward to identify the dose that is on the threshold of the plateau of the dose response. Typically, this is the dose that attains up to 80–90% of the maximum response (or some other clinically relevant target). In other words, a model-based approach turns the dose selection process in to a calibration problem—it allows efficient and robust identification of the minimum dose that attains optimal efficacy, given an adequate safety threshold.

6 Application of Model-Based Approaches to Indacaterol Dose Selection

Two retrospective approaches were taken for the model-based analysis of the indacaterol dose response (Renard et al. 2011):

  • A Bayesian study-level meta-analysis using summary level study data

  • A nonlinear mixed effects analysis of patient level data

Readers are referred to the original article (Renard et al. 2011) for precise details. For the purpose of this graphical exploration, only the study-level analysis will be considered here.

Typically, model-based approaches use the raw patient level data to derive an estimate of the dose response relationship. However, in the case of the indacaterol program, the large number of long trials provided a rich data base of study level data which allowed estimation of the dose response from the summary level data while accounting for both within- and between-trial variability.

Specifically, the least square mean placebo contrast estimates for each dose level (between 18.75 and 600 μg), visit and trial were included in an Emax dose response model. The complete analysis included data from 12 trials which had data from 8,111 patients. Positive control data was also included to allow benchmarking against important comparators.

Since all trials had similar inclusion/exclusion criteria and the ANCOVA analysis used to derive the least squares mean estimates accounted for major known sources of variability, a level playing field was created for the purpose of the model-based meta-analysis. Figure 11.15 provides an overview of the study-level analysis data set. This figure is essentially an expansion of Fig. 11.14, that now captures all visits in each trial. Box plot summaries of the data have been superimposed on the points to give an impression of the shape of the underlying dose response relationship. It would be safe to conclude that with increasing dose, the response approaches a plateau; this is expected for the dose response relationship for a beta-agonist. The dotted line marked MCID indicates the minimally clinically important difference, a threshold of 120 mL believed by some experts to be the degree of bronchodilation that should be exceeded to achieve optimal clinical benefit (Cazzola et al. 2008).

Fig. 11.15
figure 15

Overview of data used in study level dose response analysis

The final analysis of these data, including the comparators formoterol (12 μg), salmeterol (50 μg), and tiotropium (18 μg), is presented in Fig. 11.16. This analysis allows the dose response relationship to be explicitly estimated while accounting for the considerable within- and between-study variability. It is apparent that 150 μg is located on the threshold of the plateau and that it is the minimum dose that provides a consistent advantage over the comparators.

Fig. 11.16
figure 16

Dose response relationship based on the totality of the study level data

Given the explicit characterization of the dose response, it is possible to rank the doses on the effective dose scale that calculates the percentage of the maximum response achieved by a given dose. Table 11.1 presents the results of the analysis. Key to dose selection is the identification of the threshold to the plateau of the dose response; below this point, the response will drop off very rapidly, and above this point little additional benefit is achieved. From the table, we see that 75 μg represents the ED74, 150 μg the ED85, and 300 μg the ED92. In other words, these doses span the transition to the plateau of the dose response relationship.

Table 11.1 Results of the study level dose response analysis

Placing the response predictions in the context of a graded dose response, the MCID and the comparator data, provides a clear means of assessing the relative benefit of the various doses. Figure 11.17 presents a ranking of the doses according to their predicted response. In contrast to the traditional approach presented in Figs. 11.11, 11.12, and 11.13, the model based estimates are much more precise and allow differentiation of doses. This increase in precision has been attained by pooling information across doses to estimate an overall dose response relationship. Based on the study level analysis presented in this figure, it is possible to state the following:

Fig. 11.17
figure 17

Ranking of trough FEV1 responses based on dose response analysis

  • There is a 92% probability that 37.5 μg is below the MCID of 120 mL

  • There is a 95% probability that 75 μg exceeds the MCID

  • 150 μg has an incremental benefit over 75 μg and is the lowest indacaterol dose that exceeds the average bronchodilation observed for the comparators

  • 150 μg is located mid-way between the MCID and the maximum response and exceeds all comparators

  • 300 μg intersects the maximum response

The ability to characterize the indacaterol dose response in this precise manner bears testimony to the power of the study level dose response analysis. It would not be possible to derive information of this quality and precision in any single trial of practical size.

It is important to stress, the purpose of this presentation is not to question the choice of the 75 μg dose of indacaterol by the FDA which is based on the Agency’s risk benefit assessment (Chowdhury et al. 2011). The model based analysis shows that 75 μg is the lowest tested dose that both exceeds the MCID and is as good as the best comparator tested. For this reason, Novartis proposed 75 μg as the lowest dose in its submission to the FDA, while claiming that incremental benefit can be achieved at higher doses. However, it is the aim of this publication to highlight the shortcomings of the traditional approach to dose ranging favored by the Agency. It is apparent that the dose response data as presented in Figs. 11.11, 11.12, and 11.13 do not allow any statistical meaningful differentiation of doses. Given the confidence intervals, it is perfectly possible that doses of 18.75 or 37.5 μg could randomly produce responses greater than any of the higher doses in the traditional dose ranging trials. It is not clear how the agency would have reacted had this possibility actually arisen. Indeed, the large differences observed between the Day 2 and Day 15 responses for the positive control, salmeterol, in both the asthma and COPD trials (Figs. 11.11 and 11.12) suggest that such spurious results are likely to occur in the traditional design and analysis. To avoid confounding the dose selection process with spurious results, it is necessary to design studies and analysis methods that are capable of appropriately handling the low SNR in FEV1. Placing the process of dose selection in the context of explicit dose response assessment greatly increases the probability of efficiently selecting the optimal dose.

7 Sensitivity Analyses

Despite the advantages of the study-level analysis presented above, it is nevertheless a retrospective meta-analysis and hence requires qualification. Such qualification or sensitivity analyses assess the robustness of the results to various assumptions or natural constraints of the analysis. For example, in a meta-analysis, it is typical to test the sensitivity of the results to the sample of included studies.

The history of the program provided a natural means for such a sensitivity analysis. Prior to the first submission to the FDA, the available efficacy trials were included in a study level meta-analysis of the dose response. After the first submission, the agency requested studies with lower doses. By the time of resubmission, a further 6 studies had become available with more data at lower doses and were analyzed independently using the same methodology. Given the excellent correspondence between both independent analyses, all studies were pooled in the final analysis as presented in Fig. 11.16.

Figure 11.18 shows the graphical comparison of all 3 analyses. Although no doses lower than 75 μg were available for the first analysis, it nevertheless allows an estimate of the dose response that is consistent with the second analysis that mainly included studies with lower doses. As expected, the analysis of all 12 trials lies between the previous 2 and its confidence interval includes the mean predictions of both.

Fig. 11.18
figure 18

Sequential analysis of first six studies (black points), last six studies (blue points), and all 12 studies together

Given that only one study included doses lower than 75 μg, it was considered important to assess the impact of this trial on the overall analysis if the results of the low dose study had been different. For this purpose, a sensitivity analysis was carried out whereby the responses of all doses in low dose study were adjusted relative to the response of the 150 μg dose. So for example, the responses of the low dose study were all adjusted corresponding to the fractional deviation for the 150 μg response from the 25th, 50th, and 75th percentile of the population estimate for the response to this dose. Figure 11.19 presents the results of the analysis where the responses of the low dose study were adjusted upward by the fractional deviation of the 150 μg dose from the third quartile of the population response for this dose. As expected, significantly increasing all the responses of the low dose trial caused the estimated dose response to increase, but not by a margin that would alter the conclusions of the original analysis. This analysis provided confidence that the rest of the data set (without the lowest doses) has adequate information to provide a robust estimate of the dose response.

Fig. 11.19
figure 19

Sensitivity analysis: the response of the low dose study is adjusted to correspond to 3 quartiles of the population estimate

8 Summary and Conclusions

FEV1 is commonly used as a diagnostic tool for respiratory disorders. It is simple to interpret and has adequate SNR to assist the diagnosis of conditions such as asthma or COPD. Using graphical analysis, it has been shown that the low SNR of FEV1 as a marker of the bronchodilatory response becomes problematic when assessing dose response relationships

In the assessment of bronchodilator efficacy, the problem of the low SNR is compounded by the poor precision of the traditional approaches to dose ranging trials advocated by some regulatory authorities. While these trials are adequate for detecting a dose response, they lack the precision to adequately differentiate active doses. Simply increasing the trial size to compensate for the poor precision is not a viable option given the number of patients that would be required to cover all doses necessary to characterize a dose response relationship.

In the case of indacaterol, it was shown that there was considerable variability among the indacaterol doses and positive controls on the respective trial days and the confidence intervals of all doses tested by the traditional approaches overlapped. It is concluded that the traditional approach cannot distinguish active doses from one another in any statistically meaningful manner. Given the large between- and within-trial variability, it is difficult to conclude the dose selection decisions based on such methodology alone can be robust.

A model-based approach that provides an explicit estimation of the dose response was presented to support dose selection. This approach used the totality of the data from all major trials to estimate the dose response relationship while accounting for the within- and between-trial variability. Pooling information across the program in this manner allowed precise estimation of the population dose response and provided a robust basis to support dose selection.

The study design and analysis methods used to support dose selection must be tailored to account for the properties of the underlying data. Placing the dose selection process of bronchodilators in the context of an explicit estimation of dose response greatly increases the chances of robustly and efficiently identifying the optimal dose.