Introduction

Clinical trial simulation (CTS) may be used to assess how different design and drug factors may affect trial performance. These factors may be controllable trial design properties, e.g., the doses studied, the duration of treatment, or uncontrollable drug characteristics, such as its pharmacokinetic or pharmacodynamic models and parameters. Other influencing factors may include the progression of disease over time or subject specific characteristics that may be related to disease progression or treatment response. Provided there are specific decision rules for determining that a particular trial was positive, or for judging an estimate to be sufficiently accurate, CTS can provide a rational basis for making decisions about the trial design and quantitating how effectively the design can answer the study objectives (1,2). Clinical trial simulation can be viewed as an extension of conventional statistical design evaluation. Data derived models are utilized, based on the relationship between dose, exposure, the time course of disease progression, placebo effect, and the outcome measure, providing an alternative approach to that described in the statistics literature (3).

A CTS study is presented which guided the design of a proof of concept (POC) trial for an M1 muscarinic agonist (CI-1017) being developed for the symptomatic treatment of Alzheimer's disease (AD) (4). M1 receptors are abundant in the hippocampus and cortex playing an important role in learning and memory and are an attractive target for cognitive enhancement in AD (5,6). Direct acting muscarinic agonists may also be effective in modulating tissue levels of the amyloid β peptide, which in its aggregated form is highly toxic to neurons and is a major constituent of senile plaques characteristic of the neuropathological diagnosis of AD (7).

Because the mechanism of action of CI-1017 was untested clinically, the principle objective of the clinical study was to ascertain whether CI-1017 improved cognitive performance at least as fast and as well as tacrine—a commercially available product for this indication. This would be considered proof-of-concept (POC). At this stage of development, while it was not essential to obtain accurate estimates of the magnitude of drug effect, it was necessary to obtain information that would enable an early decision about whether to continue investing resources in the development of the compound. This distinction in key objectives and awareness that the POC study design was for internal decision making, prompted considerations of study designs that otherwise might not be considered suitable were it a pivotal registration study.

Preclinical pharmacological studies in rats and mice which were deficient in performing spatial memory tasks (i.e., the ability to find a hidden platform in a water maze (8)) as a consequence of genetic modification or lesioning of the nucleus basalis, responded to treatment with CI-1017 but demonstrated a U-shaped dose-response (DR) relationship. As doses were increased, the latency time to find a hidden platform decreased, but at higher doses the latency time increased. A similar outcome was observed in another study that investigated reversal of scopolamine-induced impairment of vigilance in rhesus monkeys, as measured by a continuous performance task. This U-shaped phenomenon has also been described in a clinical setting by Soncrantt et al., who reported that the cholinergic agonist arecoline (an alkaloid of the betel nut with agonist activity at M1 and M2 receptors) improved the ability to remember verbally presented words, in a small group of subjects with probable or possible Alzheimer's disease (9). When the results were averaged across patients a U-shaped DR curve was observed. Based on these preclinical and clinical data, it was considered plausible that CI-1017 might also have a U-shaped DR profile and therefore a secondary trial objective was to characterize the dose-response.

Typically, effectiveness trials in AD are based on a parallel group design with two to four treatment groups plus a placebo group powered to detect a three-point improvement in the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog) score after a minimum of 12 weeks of treatment (1012). The ADAS-Cog is an objective test that evaluates memory, attention, reasoning, language, orientation and praxis (maximum score, 70); a decrease in score indicates improvement (13). Assuming a three-point treatment effect size from placebo with a standard deviation of 5.7, a parallel group design requires approximately 80 subjects per dose group to have 90% power (based on a two sided test and 5% significance level). It has also been consistently demonstrated that cholinesterase inhibitors may take up to 12 weeks to fully reflect the response to a given (14,15) dose and therefore studies designed to demonstrate the effectiveness of a pharmacological agent are at least of this duration.

Non-human toxicology studies for CI-1017 permitted up to 12 weeks of treatment and a POC study of this duration was proposed for CI-1017. Human pharmacokinetic and safety data from the Phase 1 single and multiple dose studies indicated that 25 mg TID was the maximum dose with tolerable adverse events in healthy elderly patients. Other doses available for study were 2, 5, 10 and 15 mg.

A population pharmacokinetic pharmacodynamic analysis of tacrine clinical studies estimated a 3-unit improvement on the ADAS-Cog at 80 mg with the time to reach 50% of this drug effect (equilibration half-time) of about 2 weeks (14). This implied that the full drug effect (i.e., effect at pharmacodynamic equilibrium) would be apparent by 12 weeks of treatment. As it was not a requirement that the full drug effect be measured to demonstrate that the treatment was superior to placebo, designs that studied the change in ADAS-Cog after treating Alzheimer's patients for less than 12 weeks were considered. Subsequently, a clinical trial simulation study was undertaken to compare how well different designs of approximately equal cost, could meet the POC objectives for a range of drug candidates that differed in their pharmacodynamic properties.

Simulation Study Objectives

The primary objectives of the simulation study were to compare the power of the different study designs to detect a treatment effect of a specified size and to compare design performance to differentiate a DR pattern that was monotonic, from one that was U-shaped. A secondary objective was to evaluate the bias in the drug effect size estimate relative to that assumed at pharmcodynamic equilibrium.

As the true underlying effect for this novel treatment was not known, four theoretical DR curves were considered, each reflected by a unique dose-response model of the drug effect. The curves had either a monotonic (linear, hyperbolic, or sigmoidal) or U-shaped DR relationship and an effect size of 3-unit at the best dose in the range up to 25 mg. Alternatives were also considered for the time to reach the pharmacodynamic equilibrium state, being either “fast” (50% of the full drug effect by 3 days) or “slow” (similar to tacrine and with 50% of the full drug effect reached at 2 weeks). Based on the pharmacokinetic convention that five half-lives approximate a steady state, these drugs would reach the pharmacodynamic equilibrium state at about 2 and 10 weeks, respectively. A target sample size was set at 60 based on a preliminary power analysis for a Latin Square design and budget considerations.

Materials and Methods

The Simulation Model

Population Pharmacodynamic Model

A population pharmacodynamic model, relating plasma concentrations to ADAS-Cog scores was used to simulate realistic patient responses (14,16). Drug effect was assumed to be symptomatic rather than disease modifying (17). The model and its parameters were based on 5,263 ADAS-Cog measurements obtained from 909 patients recruited in French and American tarcrine studies (14). The ADAS-Cog score at any time during the study (i.e., S(t)), was a function of a linear combination of sub-models (Eq. (1)) that included a baseline value (S 0), a linear time course for untreated disease progression (α· t), a placebo effect (PD(C ep)) and drug effect (PD(C eA)). The residual error (ɛ) was assumed to be normally distributed with variance σ 2.

$$ S{\left( t \right)}{\text{ = }}S_{0} {\text{ + }}\alpha \cdot t{\text{ + PD}}{\left( {C_{{{\text{eP}}}} } \right)}{\text{ + PD}}{\left( {C_{{{\text{eA}}}} } \right)}{\text{ + }}\varepsilon $$
(1)

The delay in reaching a pharmacodynamic equilibrium state was characterized using the effect compartment model which assumes the drug effect is dependent on the effect site concentrations which differ from systemic concentrations prior to the attainment of pharmacokinetic equilibrium (18). Accumulation of drug at the effect site was dependent on the plasma concentration that was predicted by the CI-1017 pharmacokinetic model, and the effect compartment equilibration half-time (t 1/2eq).

Placebo Response Model (PD(C ep))

Patients entering an Alzheimer's trial may demonstrate a placebo response. Therefore, a placebo component was included in the simulation model. Based on the placebo model described by Holford and Peace (16), a placebo response (ADAS-Cog p (t)) was assumed, that would develop over the early days of treatment and subsequently fade to zero after about 6 weeks. Mathematically, the time course of placebo response can be described by Eq. (2) with parameters Keq p , the rate constant defining the onset rate of the placebo effect (days−1); Kel p , the rate constant defining the offset rate of the placebo effect (days−1), and β p , a scaling parameter defining the size of the placebo effect. The time in days after the initiation of treatment is represented by the independent variable t.

$${\text{ADAS}} - {\text{Cog}}_{p} {\left( t \right)} = \frac{{\beta _{p} \times {\text{Keq}}_{p} }}{{{\left( {{\text{Keq}}_{p} - {\text{Kel}}_{p} } \right)}}} \times {\left( {e^{{ - {\text{Kel}}_{p} \times t}} - e^{{ - {\text{Keq}}_{p} \times t}} } \right)}$$
(2)

The onset half-life of the placebo effect is related to Keq p by Eq. (3).

$$t_{{1 \mathord{\left/ {\vphantom {1 {2_{{eqm}} }}} \right. \kern-\nulldelimiterspace} {2_{{eqm}} }}} = \frac{{\ln {\left( 2 \right)}}}{{Keq_{p} }}$$
(3)

The offset half-life of the placebo effect is related to Kel p in a similar fashion.

Drug Response Model (PD(C eA))

Five different drug response models were used. One model characterized the drug as inactive (no effect) while the other four described a linear, hyperbolic, sigmoidal or U-shape. Mean parameter values for the active concentration response relationships were calculated based on lowering the mean ADAS-Cog score by three points at the best dose in the testable range. This was at 25 mg for monotonic patterns and 10 mg for the U-shape pattern. Figures and parametric forms of the drug effect models are displayed in Fig. 1 and Table I, respectively. The parameters characterizing drug potency for the linear and non-linear models (i.e., slope or ED 50 [the dose at which 50% of the maximum drug effect was achieved] were converted to an average steady state concentration in a 65-year old non-smoking population.

Fig. 1
figure 1

Drug effect models considered in simulations study. Parameters characterizing the model are displayed in the individual panels.

Table I Parametric Form of Models Describing Active or Inactive Drug Effect.

To reflect between-patient random variation, the baseline ADAS-Cog score (S 0), rate of score change (α), onset half-life of placebo effect (t 1/2eqp), offset half-life of placebo effect (t 1/2elp), placebo effect magnitude (β p ), and active drug equilibration half-life (t 1/2eqa), were simulated from univariate independent log-normal distributions. The coefficient of variation (CV) for these distributions is shown as percentage CV in Table II.

Table II Parameter Values Characterizing Disease Progression, Placebo Effect, and Drug Effect Components used in Simulation Model

Disease progression, placebo response, and drug effect parameter values are displayed in Table II. Population variability parameters were arbitrarily set to 30% based on discussions with consultants and values reported by Holford and Peace (14). The drug effect equilibration half-time and placebo response reported were associated with the different study populations and the parameter estimates selected for these components in this simulation study reflected an American population (14).

Pharmacokinetic Model

Single doses ranging from 0.25 to 150 mg and multiple doses of 2, 10, 25, and 50 mg q6h were rapidly absorbed. The time of maximal plasma concentrations (tmax) occurred approximately 1 h after oral administration. With multiple-dose administration of CI-1017, maximum plasma drug concentrations (C max) and total exposures (AUC) increased in greater proportion relative to the increase in dose over a dose range of 2 to 50 mg. Multiple doses of 2, 10, and 25 mg administered q6h were generally well-tolerated.

Population pharmacokinetic parameters and estimates of their variability were derived from fitting the Phase 1 data using NONMEM and are displayed in Table III (19). Plasma concentrations of CI-1017 were described by a two-compartment model with first order input and a lag-time. The relative bioavailability of each dose was estimated in comparison to the nominal standard dose of 25 mg. Clearance (CL/F) was dependent on age (years) and smoking status (Eq. 4).

$$ {\text{CL}} = 94.5 \cdot e^{{ - 0.0135 \cdot {\left( {{\text{age}} - 40} \right)}}} \cdot {\text{smk}} $$
(4)
Table III CI-1017 Population Pharmacokinetic Parameter Estimates

If the subject was a smoker, the variable smk was equal to 1.5, otherwise 1. The between-subject random effects for some pharmacokinetic parameters (clearance, central volume, inter-compartmental clearance, and peripheral volume) were correlated. The correlation was incorporated into the simulation model by using a multivariate normal distribution from the estimated variance–covariance matrix obtained from NONMEM. The estimated between occasion variability (BOV) in pharmacokinetics was also incorporated into the simulation model (20). At each 7-day occasion, random normal distributions with mean 0 and variance \( \sigma ^{2}_{{{\text{BOV}}}} \) were used to generate new subject specific random effects values for BOV, which were then added to the subject's pharmacokinetic parameters.

Drop Out Model

A 1% weekly dropout rate was assumed based on an informal review of past experience in these types of trials. This was implemented in the simulation model as a survival model where the expected percentage of patients remaining in the trial at time = t days, S v(t), is described by Eq. (5).

$$ S_{{\text{v}}} {\left( t \right)} = e^{{ - 0.00145 \cdot t}} $$
(5)

Trial Designs

Eight trial designs were evaluated, including Latin Square, incomplete block, and parallel group, as well as two composite designs that included both crossover and parallel group arms. Each design had approximately equal sample size (60 total), as a surrogate for trial cost. No patient received non-placebo treatment for more than 12 weeks—the total trial length was 12 weeks for all designs except for number 8, which was 16 weeks. The key characteristics of these designs are shown in Table IV. Table V shows the dosing sequences for the six groups in design number 1, a six period, six-sequence group Latin Square that is balanced for pair-wise treatment sequence—a William's design (21). Design number 6–8 were based on a similarly defined 4 × 4 Latin Square, using 2, 10, and 25 mg doses and placebo. The lower doses (2, 10 mg) were selected to approximate the ED10 and ED50 while the 25 mg dose was selected to achieve a 3-unit change in the ADAS-Cog for the assumed population average monotonic dose-response relationships. For the U-shaped dose-response relationship, 10 mg defined the population average peak effect, while the bordering doses of 2 and 25 mg doses mediated a small effect (less than 25% of E max for the agonist–antagonist model).

Table IV Trial Designs Evaluated in Simulation Study
Table V Dosing Sequences in Design Number 1

Data Evaluation Methods

An analysis of variance (AOV) model appropriate to each design was used to analyze the simulated data. For each trial objective, a decision rule was defined to translate a trial's analysis results into an outcome from which the percentage correct was calculated (over the 100 replicate trials). The analysis methods employed for each of the trial objectives are detailed more explicitly as follows.

  1. Primary Objective A:

    Does the drug work?

    The AOV was used to test the null hypothesis of no drug effect on ADAS-Cog. Rejection of the null hypothesis constituted a “positive study” finding, which would be judged as “correct” for all data models except the no effect model, for which “not positive” was correct. The specific AOV model depended upon the design, but the decision rule for declaring a “positive study” was based on a 2 degrees of freedom (linear and quadratic) dose trend test at the two-tailed 5% significance level together with at least one dose declared statistically better than placebo (p < 0.025, one tail).

  2. Primary Objective B:

    Is the shape monotonic or U-shaped?

    After exploring various parametric and non-parametric approaches, a simple but robust criterion was adopted for deciding whether a trial's data supported a monotonic or U-shaped pattern (within the tested dose range). A two-stage approach was used, initiated with a test for activity that was identical to the procedure used in primary objective A, except for using a higher type 1 error setting (0.1 one tail rather than 0.025 one tail). As this was not a pivotal registration study and was to be used for internal decision-making, relaxing of the critical value for rejecting the null hypothesis and increasing the Type I error to 0.1 for the shape classification algorithm was considered justified. For a non-positive trial (by this relaxed standard), the response pattern was classified as ‘flat’ (i.e., essentially as ‘no information regarding shape’); otherwise an inference was made between monotonic and U-shaped based on the pattern of the estimated dose group means from the AOV. For the four-dose group designs, monotonicity was declared if the highest dose group had the best mean outcome; otherwise a U-Shaped pattern was declared. For the six-group designs, monotonicity was declared if either of the two highest dose groups had the best mean.

  3. Secondary Objective:

    What is the Accuracy of the Pharmacodynamic Equilibrium Effect Estimate?

    An effect size estimate within one point (33%) of the true steady state effect size was considered “correct” for the purposes of the simulation study. The simulated estimate at the 25 mg dose was used for the monotonic DR patterns (regardless of the outcome for objective number 2), or the 10 mg dose when the true pattern was U-shaped.

Simulation Methods

One hundred trial replications were simulated using Pharsight Trial Simulator (22) for the different study designs listed in Table IV, for each of the dose-response patterns and for both the slow and fast drug types. The residual error, the effect size at the best dose, and the number of subjects were held constant, while the period length, number of doses, and number of measurements per period were varied. The data from each trial were analyzed to obtain and score the conclusions for each trial objective. The percentages of correct trials were tabulated for review, leading to the design recommendations described below.

The Clinical Study

Based on the results of the simulations, the recommended design was carried out in an Alzheimer's population. Both the 2- and 4-week measurements within each dose were used in the analysis of the ADAS-Cog subscale. The changes from baseline in total score on the ADAS-Cog were analyzed using a mixed-model analysis of variance that was similar to the method used in the simulation. The model contained fixed effect terms for treatment sequence, period, carryover effect, baseline value, and dose. Random effects were included to model the correlation between measurements within a patient, and to model the correlation between the 2-and 4-week measurements within each dose. A two-degree of freedom linear-quadratic trend test was performed to investigate dose-response (5%, two-tail). The study was to be considered positive if both the trend test (5%, two-tail) and the improvement of ADAS-Cog relative to placebo, at one or more active doses (2.5%, one-tail) were significant.

Results

Simulation Study Results

For detecting drug activity, all designs correctly declared the no effect drug candidate as “not positive” with an error rate of about 2.5% (observed rates based on 100 simulated trials ranging from 1% to 4%). For each design, the percentage of positive trials averaged over the four dose-response patterns is displayed in Table VI, for the slow and fast acting drug types.

Table VI Estimated Power for Detecting Activity (%), Averaged over the Four Dose-response Patterns

For each drug type, the best designs were 1, 6, 7, and 8, with design number 8 (4 × 4 Latin Square, 4-week treatment periods) being the best for both drug types. The parallel group design (number 3) was the poorest performer for both drug types. Only design 8 reached the 80% power level on average over the four dose-response patterns with the sample sizes used. Across designs, the estimated power was always higher for the fast acting drug. Therefore in order to simplify the description of further results the design performance for the slow acting drug only is reported, as this will characterize trial performance for the least optimistic scenario. Table VII partitions the average power to detect a drug effect into the results for each of the active drug profiles. Table VIII displays the estimated power to correctly identify the dose-response shape. For all shapes, design 8 performed as well or better than designs 6 and 7.

Table VII Percent of 100 Trials (power) that Detected a Drug Effect for Slow Acting Drug for Design Number 6, 7 and 8
Table VIII Percent of 100 Trials (power) that Correctly Identified Dose-response Shape for Slow Acting Drug for Design Number 6, 7 and 8

Table IX displays the power to correctly estimate the true effect size (to within +/− 33% of the true effect). As expected the overall level of performance is low. Averaged over the four DR models, designs 6, 7 and 8 are similar. For all the objectives, the power of any design was always lower for the U-shaped drug type compared to the drug type with monotonic dose-response characteristics.

Table IX Percent of 100 Trials (Power) that Estimated the Effect Size for the Slow Acting Drug at 25 mg (monotonic) or 10 mg (U-shape) to be within +/− 33% of the True Effect

Based on these results, design 8 was selected for the actual clinical trial. Seventy patients were randomized to one of four assigned treatment sequences. Each treatment sequence consisted of three dose levels of CI-1017 (2, 10, and 25 mg) taken three times a day (TID) and matching placebo with each dose level administered sequentially. The sequence of administration was randomly assigned. All patients were intended to receive 4-week treatment with each of the dose levels of CI-1017 and placebo.

Clinical Study Results

The results of the clinical study are displayed in Table X. The study was not considered positive since the linear-quadratic trend test of ADAS-Cog scale was not significant and none of the three doses of CI-1017 were significantly better than placebo. The root mean square error (RMSE) derived from the mixed-model analysis of variance was 2.47 ADAS-Cog units.

Table X Analysis of ADAS Cognitive Results (Intent-to-Treat) from a 16-week, Randomized, Double-blind, Placebo-Controlled, Crossover, Multi-Center Study of CI-1017 in Patients with Mild to Moderate Alzheimer's Disease

Discussion

It is often thought that the most convincing studies that demonstrate the benefit of therapy are parallel group designs with long treatment phases, in part because the drug effect at the time an equilibrium state has been reached is considered to be the critical endpoint. Crossover designs on the other hand are considered to harbor methodological problems and confound estimates of the treatment effect under various conditions, particularly when carry-over is present (3). This paper describes a clinical trial simulation study that provided a quantitative comparison of the potential performance of different study designs in Alzheimer's disease for fast and slow acting drug types and ultimately supported the implementation of a cross-over design for a POC study. While this example is limited to a POC trial in Alzheimer's disease, the concepts discussed are not limited to that therapeutic area and would be applicable to many Phase 2 and 3 clinical trial settings.

The simulation model was characterized by parametric descriptions of the drug pharmacokinetics and pharmacodynamics, the natural progression of the disease, and the placebo effect. Estimates of the residual error and within subject variability for the different parameters were also included in the model. The simulated data were analyzed using a polynomial-based approach for dose-trend testing. This analysis method satisfied a basic requirement that the test be appropriate in the presence of non-monotonicity of response with dose, a requirement that excluded any trend test for which the validity depends upon monotonicity. Pragmatic considerations and time constraints were also factors in the choice of the analysis method for the simulations. It was recognized that the analysis method could have some effect on the relative estimates of design performance, however it was decided not compare the performance of different dose trend tests. This would have added another dimension to what was already a complex project, potentially compromising the timely delivery of the simulation results to the development team. Holford reports between-subject variability of 208% for disease progression and approximately 100% for placebo and drug potency parameters (14) and did not estimate random effects for any other parameters. In this simulation study, parameters accounting for the between-subject variability were assigned to all fixed effect parameters with the exception of E max and this partitioning was considered to result in an overall between-subject variability that approximated that reported by Holford. However, as the primary consideration was to understand the performance of a trial design using this set of assumptions, the impact of the size of the random effects was not explored. The residual variability matched that described in the literature.

Because the model and model parameters were based on data analyzed almost 15 years ago (14,16), the possibility of a change in the time course of the disease and placebo response between former and latter patient populations merits comment. In a study of 331 patients receiving a placebo treatment reported by Feldman et al. in 2005 (23), the mean 12-month decline in ADAS-Cog was 5.6 U/year (95% CI: 4.8–6.4). Stern et al. (1994), reported a decline of 4.9 points that falls within this confidence interval (24), while a number of placebo-controlled studies (2529) have shown that the annual decline in cognition is 5–8 points with a 2-year decline of seven points reported by Sano et al. in 1997 (28). Nevertheless, even if disease progression rate was slower than that used in this simulation study, the impact would be minimal because the effect would be equal for all treatment groups and the duration of the study was short. With regard to the placebo response, it is difficult to make assertions as to whether this profile has changed over time because investigators do not usually differentiate the placebo response from the underlying disease progression.

As the principle goal of this Phase 2a study was to answer the key questions as to whether the drug had any effect at all given the target sample size, as quickly as possible, it was not considered necessary to measure the steady state treatment effect. Additionally, even though the trial was considered a tool for internal decision-making and would not satisfy the regulatory requirements for a confirmatory study, it was believed that a positive outcome would provide suitable supportive data for registration. The 4 × 4 crossover design was demonstrated to be the best design among those considered for detecting activity. For the fast acting design, which produces a larger effect size sooner than the slower one, the power for detecting the effect was higher regardless of design. Therefore the design evaluation focused on the performance assuming administration of the drug type that was slower acting. The lower power that was consistently observed for the drug type with U shape characteristics reflects the fact that the mean effect size at any of the dose options was less than 3, even though the population model defined a three-point effect at 10 mg. Due to the variability in ECeAU50 and ICeAnU50 parameters specified in the agonist–antagonist simulation model, an individual's “best dose” would not necessarily be at any of the doses included in the simulation study. Consequently this would lead to an average effect size of less that three points at any one of the doses studied. This simulation approach reflects inter-subject variability to drug potency, which was considered a realistic pharmacological concept. Other methods were not considered pharmacologically realistic e.g., an agonist antagonist model that excluded inter-subject variability and only described residual variability.

While there were significant advantages to the implementation of the cross-over design in this instance, the design and analysis strategy (ANOVA) itself does have limitations, and these should be addressed when considering this type of study. For example, the design and analysis is unsuitable for evaluating beneficial or adverse treatment effects that are apparent only after 4 weeks of therapy because the observed effects would be assigned to the wrong treatment. Further, this design would be inefficient for evaluating effects that persist unchanged into subsequent periods; the effect estimates become increasingly biased as carry-over increases. Alternatively, a pharmacologically based model analysis would explicitly account for carryover in a mechanistic way.

Further development of CI-1017 was terminated based on the results of the clinical study. The residual error standard deviation of 2.47 estimated from the actual POC study was 60% of the value observed in parallel group tacrine trials and 60% of the value used in the trial simulations (four ADAS-Cog units). This implied that the study was better powered to detect a signal than the power associated with a similar sized parallel group trial and the outcome appeared consistent with results for another drug in this class. Veroff et al., reported a mean treatment difference of −1.26 (p = 0.22), based on an intent-to-treat (ITT) analysis in patients administered 75 mg TID of xanomoline (an M1/M4 selective muscarinic agonist), for 24 weeks. Visual inspection of a figure depicting the change in mean ADAS-Cog score at 4, 8, 12 and 24 weeks that was based on the ITT population, suggests no difference between the placebo and drug treated groups at 4 weeks and that the maximal drug effect is exerted between 8 and 12 weeks (30).

The implementation of the crossover design resulted in substantial time and cost savings relative to a traditional parallel group study. The minimum direct costs associated with a conventional 12 week, 300 patient Alzheimer's POC study in 25–30 centers was estimated to be about US$4M (2001) and to extend over 24-month duration, from the time of enrollment of the first subject until the reporting of the results. In this example, because fewer patients and recruitment centers were required, the study was executed for one quarter of the cost (approximately US$1M) and the time from enrollment of the first patient to reporting of results was reduced to 7 months. These time-savings enabled resources to be reallocated to other development projects and the additional value of this “indirect saving” is considerable and should not be overlooked. The approach of initially determining whether the drug has any activity over a short time period, in place of estimating the drug effect after long-term treatment enables an earlier termination of a development program if a drug effect is not detected. Such an approach is consistent with the theme of developing new tools and practices to improve the critical path or product development process as described in the FDA's recent document that addresses how to get fundamentally better answers about the safety and effectiveness of new medical products (31).

The use of placebo controls in Alzheimer's disease has been the center of debate and is considered an unethical practice by some, especially if the treatment is long term. Use of the crossover design as described herein, may offer a solution to this dilemma in POC studies of acute symptomatic treatments because patients are assigned a placebo treatment for much shorter periods (32,33). Another benefit of the shorter placebo period might be reduced drop out due to lack of effectiveness and enhanced recruitment as all patients will receive study drug during the trial.

This simulation study reflects a changing attitude in drug development. Firstly, the primary objective was to obtain sufficient evidence to determine if further development of the compound was justifiable by focusing on whether any activity could be detected after 4 weeks, as opposed to estimates of the drug effect after 12 weeks. Estimating the drug effect following 12 weeks of treatment was considered the goal of a of Phase IIB dose-response study. Secondly, a semi-mechanistic PKPD approach was used for simulating realistic patient level data as a function of drug exposure and time, and this enabled a quantitative comparison of the performance of different study designs under the different scenarios. This quantitative comparison eventually convinced skeptics to move forward with an atypical design. Because the study provides a “yes or no answer” to the fundamental question “does the drug work,” it can be described as a confirming type study. Sheiner has previously discussed the distinction between learning and confirming studies (34). Had the study yielded a positive outcome, analysis of the data in a stand alone mode or coupled with prior information, may have then provided the answers to more quantitative questions such as “what is the expected steady state effect size at each dose”—a “learning” type study.

In summary, these clinical trial simulation results helped the development team better understand and compare the operating characteristics of eight plausible POC trial designs. In the end the chosen design proved to be more efficient than a traditional clinical development approach leading to considerable savings in time to decision and trial costs. The results of this CTS support the continued application of modeling and simulation to aid the design of clinical studies.