Introduction

Alzheimer’s disease (AD) is the most common cause of dementia, accounting for 60% to 80% of all cases [1]. Pathologically, the disease is defined by the presence in the brain of extracellular neuritic plaques containing amyloid beta peptide (Aβ) and intraneuronal neurofibrillary tangles. The “amyloid cascade” hypothesis proposes that the driving force behind the disease process is the accumulation of Aβ [2].

Aducanumab is a human immunoglobulin G1 monoclonal antibody that recognizes aggregated forms of Aβ. An association between the presence of antibodies that recognize amyloid plaques and a slowing of cognitive decline in subjects with mild to moderate AD dementia has been described [3]. Therefore, Biogen’s hypothesis is that targeting aggregated forms of Aβ with aducanumab will attenuate the course of AD.

In a Phase 1b proof-of-concept study, aducanumab treatment resulted in a dose- and time-dependent reduction in Aβ plaques combined with slowing of clinical decline on the Clinical Dementia Rating–Sum of Boxes (CDR–SB) score. This effect was statistically significant in both the fixed-dose and titration groups treated with aducanumab compared with placebo. The results for CDR–SB observed in this study provide support for the clinical hypothesis that reduction of brain Aβ as measured by positron emission tomography imaging and expressed as a standard uptake value ratio confers a clinical benefit [4]. The results from the proof-of-concept study provided support for investigating aducanumab as a treatment to modify the underlying pathophysiology of AD.

In two identical Phase 3 studies (221AD301 and 221AD302, hereafter referred to as ENGAGE and EMERGE, respectively) that evaluated the efficacy of aducanumab in subjects with early AD, EMERGE met the primary study endpoint, while ENGAGE did not [5]. It was hypothesized that protocol amendments that altered dosing and management of amyloid-related imaging abnormalities, radiographic findings believed to ensue from accelerated amyloid clearance from the parenchyma [6], resulted in variable exposure patterns and contributed to more favorable treatment effects in EMERGE over ENGAGE.

Therefore, the objective of this research was to develop a modeling framework that (1) describes the natural progression of the disease as characterized by the primary study endpoint of change in CDR–SB score in the placebo group, (2) evaluates the existence of and characterizes the relationship between exposure (measures of concentration) and CDR–SB (exposure–response; ER) in ENGAGE and EMERGE using population modeling, and (3) assesses relevant covariate effects on disease progression and drug-effect parameters from the ER model.

Methods

Study population and design

The data set from the placebo-controlled groups used in this ER analysis comprised pooled data from two Phase 3 studies (ENGAGE, NCT02477800; EMERGE, NCT02484547) of aducanumab in subjects with early AD. The two studies were identical in design and evaluated the effect of aducanumab in populations with similar disease characteristics. ENGAGE and EMERGE were randomized, double-blind, placebo-controlled, parallel-group studies with a dose-ranging period of 18 months in subjects with early AD. Eligible subjects were randomized 1:1:1 to either aducanumab low dose or high dose or placebo treatment arms. Randomization was stratified by carrier status of the ε4 polymorphism of the apolipoprotein E allele (ApoE ε4) [5]. Additional details about the studies are provided in Table 1 [5].

Table 1 Summary of clinical studies used in exposure–CDR–SB model

There were two major protocol amendments following the initiation of ENGAGE and EMERGE, both of which impacted dose. In the first amendment, subjects who suspended treatment due to amyloid-related imaging abnormality could, after resolution of findings, resume treatment at the same dose and continue with dose titration. The second protocol amendment increased the target dose from 6 to 10 mg/kg in ApoE ε4 carriers in the high-dose group [5]. ENGAGE and EMERGE were terminated early (21 March 2019) based on futility assessment of interim data performed after ≈ 50% of the participants had the opportunity to complete week 78.

Modeling data

Exposure data

A previously developed population pharmacokinetics (PK) model was used to derive exposure values for the ER model [7]. This model was developed using data from 2961 subjects (Phase 1 and Phase 3 patients) with 50,306 PK observations. Aducanumab PK was characterized by a linear two-compartment model with first-order elimination. The model estimated the mean (95% CI) half-life and time to reach steady state as 24.8 (14.8, 37.9) days and ≈ 4 (2.42, 6.19) months, respectively. In this analysis, model-predicted drug concentrations over time were generated using the individual empirical Bayes PK parameter estimates, actual dosing history, and the final population-PK model structure. The individual concentration time profiles across treatment groups by study and dosing groups are provided in Supplementary Fig. 1.

CDR–SB

The CDR–SB score was the primary outcome measure used to assess clinical efficacy. The CDR–SB is a composite score ranging from 0 to 18 based on domains of cognition (memory, orientation, judgment/problem solving) and three domains of function (community affairs, home/hobbies, personal care) [8]. The CDR–SB was assessed at baseline and at weeks 26, 50, and 78. The data set for this analysis included all CDR–SB measurements from both studies, with actual date and time of testing.

Handling bounded outcomes

The CDR–SB score is constrained between lower and upper limits (here, 0–18). To deal with floor and ceiling CDR–SB scores, the modeling was performed on the logit scale using a logit link function.

Using a logit link function, a disease-progression (DP) model can be formed and presented as follows:

$${LCDR}_{ij}=\mathrm{log}\left(\frac{\left(\frac{{CDR}_{ij}}{18}\right)}{\left(1-\left(\frac{{CDR}_{ij}}{18}\right)\right)}\right)=g\left(\theta , {\eta }_{i},{x}_{i}\right)+{\varepsilon }_{ij},$$
(1)

where LCDRij denotes the logit transformation of the scaled CDR–SB score (CDRij/18), g(θ, ηi, xi) is some function of the regression covariates (xi), the fixed effects (θ), interindividual random effects (ηi), and intraindividual random effects or measurement errors (εij) for the ith subject at timej. The interindividual and intraindividual random effects have a mean of 0 and variances of ω2 and σ2, respectively.

The variable yij = CDRij/18 (scaled CDR–SB score) used in the logit transformation is defined on the open interval between 0 and 1 (as the logit transformation is not defined on the boundaries). The scaled CDR–SB data at the boundaries (i.e., 0 and/or 1) were arbitrarily imputed to 0.005 and 0.995 so that the logit transformation in Eq. 1 can be applied. We acknowledge here that these imputations were chosen arbitrarily and that the impact of these choices will be discussed in the “Discussion” section.

CDR–SB censorings

Subsequent to the early termination of EMERGE and ENGAGE (based on a preplanned futility analysis), all the primary statistical analyses supporting the efficacy of aducanumab were conducted using intent-to-treat (ITT) data available from all randomized and dosed subjects but with the efficacy data after 20 March 2019 censored [5]. The censoring was used to avoid any potential bias in the subjective assessment of CDR–SB data after the futility announcement. Therefore, to remain consistent with the primary statistical analysis, the ER modeling excluded all CDR–SB data observed after 20 March 2019.

Model development

A sequential approach was adopted to model the relationship between aducanumab exposure and CDR–SB data.

Model selection was guided primarily by objective function value (OFV) changes and the Bayesian information criterion, BIC [9]. Due to the lack of prespecification of the models, the data-driven nature of the model development, and the multiplicity of testing, the validity of p-values from OFV changes between hierarchical models obtained from a chi-square distribution under the null hypothesis are difficult to defend and hence are not reported nor used to guide model selection. In addition, final model evaluation also involved (1) goodness-of-fit (GOF) and residual diagnostic plots, (2) assessment of ill conditioning via the condition number of the correlation matrix of the parameter estimates (ratio of largest to smallest eigenvalue < 1000), (3) avoidance of extreme pairwise correlations (ρ > 0.95) of the parameter estimates, and (4) plausibility of model parameter estimates.

Modeling strategy

The flowchart depicting the model development is shown in Fig. 1. A detailed outline on the strategy is provided in the subsections below.

Fig. 1
figure 1

Schematic representation of the model building process

DP model

First, a population model (Eq. 2) characterizing the natural progression of early AD was developed using the placebo data from ENGAGE and EMERGE, respectively.

$$L{CDR}_{j}\left({t}_{j}\right)={LCDR}_{0}+\left[r*{t}_{j}\right],$$
(2)

where LCDRj(tj) is the scaled CDR prediction on a logit scale, CDRj(tj) is the CDR prediction in the original open interval scale (0, 18), r is the predicted rate of CDR–SB progression on the logit scale, and LCDR0 is the baseline scaled CDR on a logit scale.

Three possible DP structural models—namely Richards, logit linear, and logit power—were assessed after inspection of the shapes in the current data set and because they are commonly used in the AD literature to describe CDR–SB [10,11,12,13]. The functional forms of these models are provided in the Supplementary Materials. Mixture models on the progression rate were also evaluated to address the observed large heterogeneity in the CDR–SB disease progression data.

ER model

Following establishment of the DP model, the therapeutic disease-modifying activity of aducanumab was evaluated using the pooled placebo and treatment data. Illustrated through a logit-linear DP model, the effect of aducanumab was modeled as additive (Eq. 3) to the CDR–SB disease progression rate (r) to form the ER model:

$$\frac{dLCDR}{dt}=\left(r-\left({D}_{eff}* {C}_{P}\right)\right),$$
(3)

where r is the predicted rate of CDR–SB progression on the logit scale, LCDR0 is the baseline scaled CDR on a logit scale at time t = 0 (i.e., the initial condition for Eq. 3), Deff is the drug effect, and CP is the aducanumab concentration at time t. Note that the differential equation form of the model was used to accommodate that \({C}_{P}\) is a time-varying measure of exposure modulating the slope of the DP model. In so doing, the entire \({C}_{P}\) time course is used to integrate the effects of exposure on the disease progression rate.

The mechanism of action for aducanumab is that it inhibits the production of Aβ plaque in the brain, thereby resulting in its removal with time. This removal in Aβ plaque is hypothesized to result in a slowing of the disease progression. To investigate this theory, an indirect response model (IDR) in addition to the above empirical models were also explored to examine the potential for temporal delays between exposure and clinical outcome.

An IDR equation that represents this mechanism of action is given by the following expression, where \(P(t)\) denotes the amount of plaque in the brain at time \(t\):

$$\frac{dP}{dt}={K}_{in}\left(1-\frac{{c}_{p}}{{EC}_{50}+{ c}_{p}}\right)-{K}_{out}P,$$
(4)

where \({K}_{in}\) denotes the production rate for plaque formation in the brain, \({K}_{out}\) denotes the first-order rate constant that governs the rate of plaque removal, \({c}_{p}\) denotes the aducanumab plasma concentration at time \(t,\) and \({EC}_{50}\) denotes the aducanumab plasma concentration corresponding to 50% inhibition of the production rate for plaque formation. For the purposes of this IDR model, we treat \(P\) as a latent variable since we do not have a direct measurement of the amount of plaque in the brain and, hence, at \(t=0\) we assume the steady-state condition:

$$P\left(0\right)=\frac{{K}_{in}}{{K}_{out}}=1$$
(5)

which implies that \({K}_{in}\)=\({K}_{out}\). Thus, the latent variable \(P(t)\) represents the fractional amount of plaque remaining at time \(t\) where \(P(0)=1\) denotes 100% of the plaque remaining at time \(t=0\) and can decrease toward \(P=0\) with time and as \({c}_{p}\) increases toward 100% inhibition of \({K}_{in}\). The IDR equation was incorporated into the DP model for CDR–SB by assuming that the drug effect is proportional to the fractional amount of plaque removed (\(1-P\)) given by the following expression:

$$\frac{dLCDR}{dt}=\left(r-{E}_{max}\left(1-P\right)\right),$$
(6)

where the variables in Eq. 6 are the same as defined in Eq. 3, with the exception that the drug effect proportional to \({c}_{p}\) (\({D}_{eff}\bullet {C}_{P}\)) is replaced by the IDR drug effect, which is proportional to the amount of plaque removed (\({E}_{max}\left(1-P\right)\)). Note that the maximum drug effect (\({E}_{max}\)) as a reduction in the disease progression rate (\(r\)) is achieved when \(P=0\).

It is important to note that the parameters relating to the disease progression in Eqs. 3 and 6 were not fixed but were estimated from the combined placebo-active treatment data set. Consistency between the estimates of the disease progression parameter estimates from the DP model (fitted to placebo data alone) and the ER model (fitted to the pooled placebo and treatment data) was evaluated to ensure that there was no confounding between the placebo and active treatment data in the context of the parallel design of the two Phase 3 studies.

Covariate model

The effect of covariates was evaluated on both the DP and ER model parameters, namely the baseline logit response, rate of progression, and drug effect. The clinical team had identified age, Mini-Mental State Examination (MMSE) score, ApoE ε4 carrier status, sex, and race as potential factors affecting the CDR–SB score. Thus, these covariates were prespecified to be evaluated in the model. Covariate analysis was conducted using a full covariate modeling approach, evaluating all the prespecified covariates for their influence. Subsequently, a stepwise backward elimination procedure (α = 0.001) was used to identify a parsimonious final exposure–CDR–SB model containing “information” content similar to that in the full model but with fewer covariates. Note that the use of the significance level in the covariate selection algorithm is viewed primarily as a tuning parameter to obtain a parsimonious final model and will not be used to infer statistical significance given the lack of prespecification and data-driven procedures used to develop the base and full models.

Simulations were conducted using the final model to investigate the clinical relevance of covariates included in the final model at the 10-mg/kg titration regimen. The impact of each covariate was displayed using forest plots. The aducanumab treatment effect (ΔΔ) at week 78 was calculated [treatment CDR–SB change from baseline (CFB)—placebo CDR–SB CFB]. Simulations were conducted to include parameter uncertainty only on fixed effects. A total of 1000 subjects were generated for each identified covariate, and the corresponding sets of parameters were sampled from their respective multivariate normal distributions (MVNs). These were then used to generate ΔΔ at week 78, with uncertainty (expressed as 90% confidence interval (CI)] relative to the reference individual. For continuous covariates, the 5th and 95th percentiles were calculated from the entire data set and used in these simulations. These results were then compared with clinical relevance intervals calculated as the mean treatment effect of the reference individual ± 0.2 [5]. If the CIs for all the identified covariates overlap the reference prediction CI and are contained within the clinical equivalence, it is concluded that none of the identified covariates have clinically relevant effects on ΔΔ.

Statistical model

The statistical model comprised variance components characterizing interindividual variability (IIV) in model parameters and an intraindividual variance component describing residual error.

The IIV in baseline parameter was investigated using an additive effect (Eq. 7) on the logit scale. The IIV in all other parameters was initially assumed to follow a log-normal distribution to constrain the subject-specific parameters for the disease progression rate and drug effect to be positive. The deviations from the typical value (η) were described by normal distributions (Eqs. 7 and 8).

$$L{CDR}_{0i}={LCDR}_{0}+{\eta }_{1i},$$
(7)
$${\theta }_{i,n}= {\theta }_{TV}* \mathrm{exp}\left({\eta }_{i,n}\right).$$
(8)

The random effects (η1,…,ηm) are normally distributed with mean 0 and variance ω2 on the diagonal of the omega (Ω) covariance matrix. Jointly, the parameters are assumed to follow an MVN of the ηs (Eq. 9) with the off-diagonal elements of Ω corresponding to the covariance between random effects:

$$\left({\eta }_{1}, \dots , {\eta }_{m}\right) \sim \mathrm{ MVN}\left(0,\Omega \right).$$
(9)

The residual variability was characterized using a logit-normal model (Eq. 10).

$${LRESP}_{i,j}= {\widehat{LRESP}}_{ij}+ {w}_{i,j}*{\varepsilon }_{i,j},$$
(10)

where LRESPi,j denotes the observed scaled CDR–SB on a logit scale, \({\widehat{LRESP}}_{ij}\) denotes the subject-specific prediction, εij denotes the intraindividual (residual) random effect (zero mean and unit variance), and wi,j denotes the additive residual standard deviation on a logit scale.

Simulations

The resulting final ER model was used in simulations of different scenarios with the following objectives: (1) to investigate the impact of aducanumab exposure patterns on differential outcomes between ENGAGE and EMERGE and (2) to assess the expected treatment effect (ΔΔ, calculated as treatment CDR–SB—placebo CDR–SB CFB) time course for the observed ENGAGE/EMERGE dosing regimen among the ITT groups (dosing: 3-, 6-, 10-mg/kg titration; randomization category: low dose, high dose).

Impact of exposure patterns on outcomes

The Phase 3 protocols for both studies, although identical in design, were affected differently by the amendments that altered dosing and dose management, resulting in variable exposure patterns across the two studies [5]. This led to a hypothesis that study differences in exposure patterns affected ENGAGE more than EMERGE, thereby leading to a greater disease attenuation (expressed as Δ = CDR–SB CFB) in EMERGE than in ENGAGE.

To evaluate this hypothesis, the final ER model was fit to EMERGE (placebo + treatment) and used to predict outcomes for ENGAGE in an external visual predictive check (VPC), with either the estimated drug effect or drug effect fixed to zero. The predictions were performed by simulating 1000 replications of the observed ENGAGE exposure patterns, considering only the sampling uncertainty. If the exposure patterns have minimal impact, observed mean Δ by treatment groups (placebo, low dose, high dose) in ENGAGE would be encompassed within the 95% VPC intervals simulated when fit to EMERGE model parameters with an estimated drug effect over a model fit to a fixed null value drug effect; this finding might suggest that the divergent results are not due to differences in drug effect, but rather to other factors that differed between the two trials. Moreover, such an external VPC using a null drug effect could also potentially be used to evaluate whether the treatment groups in ENGAGE behaved more like placebo or the EMERGE treatment groups given that the prespecified statistical analysis for ENGAGE failed to demonstrate a statistically significant treatment effect relative to placebo.

Treatment effect time profile for the observed dosing regimen

Population predictions and 90% CIs for the ΔΔ (CFB difference from placebo) of aducanumab over time in a typical individual during 18 months of treatment by study and ITT group by stochastic simulations. The simulations were performed by randomly selecting 10,000 subjects with replacement stratified by study and dosing group along with their respective dosing patterns. The final ER model, its estimated fixed and random effects, and their uncertainty (reflected as trial-to-trial variation) were used in the simulation of 1000 clinical trials. Uncertainty was quantified by having model parameters sampled from the MVN distribution using the final model parameter estimates and its corresponding covariance matrix of the parameter estimates for both the fixed-effects (θ) and random-effects parameters in Ω. At each planned week of CDR–SB assessment, the ΔΔ median and 90% CIs were generated by study and ITT groups.

Model evaluation and validation

An internal qualification of the final ER model was performed through a VPC. One thousand replicates of the data were simulated to build the VPC. If the model describes the average trend and distribution tails appropriately for a given dose, the observed median and extreme percentiles would be expected to lie within the 95% VPC interval of the simulated values [14].

If heterogeneity in the data necessitated the use of mixture models, VPCs were modified to account for the empirical Bayes estimates to identify which mixture group each simulated subject was assigned for consistency with similar estimation performed with the observed data. This was implemented by the MIXNUM assignment according to the individual probability for belonging to a subpopulation by setting MAXEVAL = 0 for each simulated data set. The simulations were then summarized by mixture group allocation and displayed graphically [14, 15].

Software

Data were analyzed with a nonlinear mixed-effects approach as implemented in the nonlinear mixed-effects modeling (NONMEM) software version 7.4.3 [16] and run on a Linux cluster. First-order conditional estimation with interaction was applied for the parameter estimation. R [17] (version 3.1.3 and above) was used in conjunction with NONMEM for automation and postprocessing purposes.

Results

Data summary

The final assembled modeling data comprised 3283 subjects (placebo, n = 1092; treated, n = 2191) who contributed to 11,088 CDR–SB measurements (placebo, n = 3715; treated, n = 7373).

The mean CDR–SB CFB stratified by treatment arms and study are presented in Supplementary Fig. 2. The low-dose group shows similar mean CFB trajectories across both studies, while the high-dose arms appear discordant between studies.

The individual CDR–SB time profiles by study and randomized treatment arms are provided in Supplementary Fig. 3. From this figure, it can be inferred that the rate of progression for the median profile is approximately linear in both scales within the placebo-controlled period. However, few subjects had CDR–SB scores ≥ 12 in all treatment groups for both studies. If a linear model was postulated to evaluate the data, it would generate predictions > 18—especially for those with high CDR–SB score at later times. Therefore, the modeling was conducted in the logit scale to constrain model predictions between 0 and 18.

Due to the early termination of the trials following futility analysis, there were fewer CDR–SB observations at week 78 relative to earlier time points, contributing to a degree of imbalance in the data set. Subjects were primarily white (76.8%) and were ApoE ε4 carriers (68.4%), with a median MMSE-BL (baseline) score of 26.4 and median CDR–SB (baseline) score of 2.44. Sex was evenly distributed (female, 48.1%; male, 51.9%) (Table 2).

Table 2 Summary of demographics and baseline characteristics of subjects included in the exposure–response analysis

DP model

A summary of key model development runs is provided in Supplementary Table 1. As a starting model, the Richards model used to characterize evolving CDR–SB progression from AD registry data [9] was compared with the logit-linear and logit-power progression models to select the best base structural model. A comparison of these models showed the logit-linear model to be the most stable. The other two models resulted in highly correlated parameter estimates > 0.95 between the shape parameter and the rate of progression. This can be attributed to the fact that the progression rate within subjects was approximately logit linear across the 18-month study duration. Therefore, moving forward, the logit-linear model was used as the final base structural model for further analysis.

However, when applying this model, three major observations were noted. First, subject-specific estimates of the progression rate parameter suggested large heterogeneity and the presence of several (two or more) distinct classes of disease progression rates. Second, a proportion of subjects had disease improvement (negative rate) with CDR–SB scores decreasing with time. Third, baseline CDR–SB scores were higher in ApoE ε4 carriers than in noncarriers.

To allow the model to capture the heterogeneity in the DP rate, a mixture model on the progression rate was implemented, allowing subjects to be assigned to one of three subpopulations (using the $MIX feature in NONMEM). The second observation was accounted for by changing the IIV model from a log-normal (multiplicative distribution that constrains the rate to be of the same sign within a mixture group) to a normal distribution with additive random effects. This allows greater flexibility in which subjects can either be progressors (positive rate) or nonprogressors (negative rate) without constraints to avoid potential discontinuities in the distribution. Last, a separate baseline parameter stratified by ApoE ε4 carrier status was estimated. For all the mixture subpopulations, a common (pooled) estimate of IIV on rate was obtained. A separate estimate of IIV on rate for each subpopulation resulted in ill-conditioned models (condition number > 1000; results not shown). Furthermore, an omega block covariance structure was used.

Implementing the above changes resulted in an OFV decrease of ≈ 167 units, indicating the presence and ability of the models to detect these subpopulations along with other identified features distinctly. Nevertheless, executing the above changes resulted in an unstable model due to near-perfect correlation between baseline and rate of progression.

While some correlation between baseline and progression rate may be expected, the perfect positive correlation in the random effects for baseline and slope is an oddity probably attributed to limitations in the experimental design, possibly due to the sparsity of the CDR–SB data and the decreasing sample size at later observation times. AD is a progressive disease that does not improve with time if untreated. Upon study entry, subjects with similar baseline values progress quite differently, with some progressing rapidly and others having no progression (or apparent recovery/improvement). Currently there are no prognostic factors to explain why the relationships move in different directions, hence the use of a mixture model to describe the heterogeneity in disease progression—including a small proportion having no progression (or recovery/improvement). Future research may result in the identification of prognostic factors that could potentially explain the apparent heterogeneity in disease progression rates. To resolve this instability, the random effect for disease progression was assumed to be proportional to the random effect for baseline to constrain these effects to have perfect correlation. Introducing these changes resulted in a stable model with a successful covariance step and no change in OFV.

Therefore, the three-subpopulation mixture model with additive IIV on progression rate and baseline stratified by ApoE ε4 status was selected as the final base model to characterize the CDR–SB disease progression in subjects who received placebo. The identified mixture classes are referred to as slow, typical, and fast progressors. The proportion of typical progressors was estimated to be ≈ 83.5% of the overall population, with the regressors (slow progressors) and fast progressors estimated to be 3.2% and 13.3%, respectively. The disease progression rate of fast progressors was approximately fourfold faster than that of typical progressors (Table 3, columns 3 and 4). No obvious trends were observed in the standard GOFs of the base DP model (Supplementary Fig. 4; Fig. 5).

Table 3 Parameter estimates of the base population exposure–CDR–SB response model

Exposure–CDR–SB model

At this step of the modeling analysis, the base DP model identified with the placebo data was updated to include the therapeutic activity of aducanumab as an additive effect (as illustrated in Eq. 3) and subsequently evaluated based on all randomized subjects from the placebo-controlled periods of the two Phase 3 studies. An additive aducanumab effect on CDR–SB was assumed because this allowed for a beneficial treatment effect in both the progressors and regressors (slow progressors). The parameters relating to the disease progression were not fixed but were estimated from the combined placebo-active treatment data set to ensure no confounding between the placebo and active treatment data in the context of the parallel Phase 3 design.

Power and \({E}_{max}\) models along with the direct proportional functional forms were also evaluated as part of model development. \({E}_{max}\) and power models had a minimal reduction in OFV (ΔOFV < 1) relative to a direct proportional model (Supplementary Table 2). Furthermore, the \({E}_{max}\) model was unstable—the estimates for \({E}_{max}\) and half the maximal effect concentration (\({EC}_{50}\)) were highly correlated (> 0.95) and implausible, indicating an ER relationship that has little curvature and that is approximately linear over the observed range of exposures. Moreover, the \({EC}_{50}\) estimate was not close to zero, indicating that the drug effect was not constant across treatments; thus, a step function for the drug-effect model form was not pursued. Therefore, based on the above runs, a linear model was selected as the most parsimonious form to characterize the drug effect.

Using the base DP model to characterize the placebo data, it was determined that baseline CDR–SB scores for ApoE ε4 carriers and noncarriers were generally similar, suggesting that a common baseline parameter may be adequate to describe the baseline response. Hence, ApoE ε4 carrier status as a covariate on baseline CDR–SB scores was not considered. Thus, the model that incorporated both placebo and aducanumab treatment arms, with an additive drug effect and a shared IIV random effect between baseline and rate, was identified as the base ER model. Additionally, the base DP model was also fit to the combined data (placebo + treatment) and resulted in an increase in BIC (Table 3), supporting that an ER model with a drug-effect component is more parsimonious.

A comparison of the ER model fit (Table 3, columns 5 and 6) with the base DP model fit (Table 3, columns 7 and 8) on the combined placebo and treatment showed that the ER model is more parsimonious (lower BIC), with a CI for Deff that does not include 0. Furthermore, the estimates of the baseline, rate of progression, and progression rate class proportions between the DP model fit to the placebo data alone (Table 3, columns 3 and 4) and the ER model fit to the combined placebo and treatment data (Table 3, columns 5 and 6) are similar, suggesting that there is no confounding between the placebo and treatment effect. This is a notable finding because it indicates that the ER model can characterize the natural disease progression as well as accurately estimate the drug effect. If there was appreciable lack of fit in the model, we might expect such lack of fit to potentially bias the estimates of the DP model parameters; however, that is not the case here since the DP parameter estimates are similar between the ER model and the DP model fit to the placebo data alone. Moreover, no appreciable lack of fit in ER model predictions was observed based on the GOF characteristics (Supplementary Figs. 6, 7).

Evaluating the potential for an IDR effect, the same three-subpopulation mixture base model was fitted replacing the direct-response drug effect (proportional to model predicted concentration) with the IDR drug effect. A comparison of both model fits (Supplementary Table 3) showed that BIC was higher with the IDR model, suggesting that the dose–response model is more parsimonious. Moreover, the parameter estimates for \({E}_{max}\), \({EC}_{50}\), and \({K}_{out}\) were poorly estimated (relative standard error > 50%), suggesting that data are insufficient at this time to support the IDR model. Furthermore, all the other parameters of the model—including baseline, progression rates for each of the three subpopulations, and the corresponding mixture proportions—were similar between the IDR and dose–response base models. Thus, subsequent model development continued to build on the dose–response base model.

Ad hoc runs with the base ER model

Following establishment of the base ER model, an ad hoc exploratory assessment of the base ER model was conducted. The objective of this assessment was to evaluate the following: (1) whether the model can be used to estimate the differential drug effect across the three progression classes identified and (2) whether the model can be used to estimate the differential drug effect across the two studies.

Run 6 was modified to estimate separate drug effects for each disease progression group (mixture class) to form run 7. Run 6 was also modified to estimate separate drug effects for each study but common across disease progression groups to form run 8. Run 8 was then modified to constrain the drug effect to 0 for the ENGAGE study to form run 9. A high-level summary of these runs is shown in Table 4. Run 6 with a common drug effect across disease progression classes and between studies had the lowest BIC, suggesting that the base ER model is the most parsimonious. The parameter estimates for runs 6 and 8 are shown in Supplementary Table 4. It is noteworthy that the drug effect in ENGAGE was estimated to be approximately twofold lower than that in EMERGE; however, the CI for the drug effect for ENGAGE does not include 0.

Table 4 Summary of ad hoc exposure–response model development runs

To further evaluate the potential for study-by-mixture interaction drug effects where a potential study effect may be different across each mixture class, additional ad hoc runs were formed using run 6 (common drug-effect model) or run 7 (different drug effects for each mixture) as the reference model. A complete factorial of ad hoc model runs evaluating study and study-by-mixture interaction drug effects are summarized in Supplementary Table 5. Among all these ad hoc runs, the base ER model (run 6) with a common drug effect between studies and across mixture classes continued to have the lowest BIC, supporting the parsimony of the base ER model.

Moreover, this model was stable with good precision of the estimates (95% CIs do not overlap 0; Table 3); therefore, run 6 was selected as the final base ER model for subsequent model development.

Examination of the impact of exposure patterns between studies using a simulation-based approach showed that a model fit to EMERGE (placebo + treatment) data encompassed the observed ENGAGE mean (Δ) over time within the 95% CIs of 1000 simulated outcomes (Fig. 2). This suggests that the ENGAGE treatment results are in better agreement with predictions based on EMERGE parameter estimates assuming a drug effect than with those based on the null model. This is particularly noticeable for the low-dose treatment group.

Fig. 2
figure 2

Visual characterization of ENGAGE and EMERGE mean CDR–SB change from baseline across the randomized intent-to-treat groups (placebo, low dose, and high dose) using model parameters predicted with EMERGE data. The red open circles and lines represent the observed mean values for ENGAGE, and the shaded regions represent 95% confidence intervals from a model fit to EMERGE drug effect or null drug effect. CDR–SB Clinical Dementia Rating–Sum of Boxes, eff effect

Final model development

A covariate search was performed using the model with a common drug effect across mixture classes and studies to assess the influence of the prespecified covariates on the DP and ER model parameters (Supplementary Table 6).

The final model included three covariate relationships: (1) the baseline CDR–SB estimate was 15% lower in Asian subjects and (2) the baseline CDR–SB score and rate of progression were typically higher in subjects with lower MMSE-BL scores. Of the overall variability, inclusion of covariates explained ≈ 6% of the variance in baseline CDR–SB and 8% in the rate of progression. Equations 1114 represent the final model.

$${LCDR}_{i}({t}_{j})={LCDR(t)}+ {\varepsilon }_{ij}$$
(11)

where LCDR(t) is the numerical solution to Eq. 3 with the initial condition that LCDR0i = LCDRi(0).

$${LCDR}_{0i}={LCDR}_{0}+\left({-0.151*I}_{Asian}\right)+\left[-0.0682*\left({MMSEBL}_{i}-26\right)\right]+{\eta }_{1i},$$
(12)
$${r}_{i}=r+\left[-4.38*{10}^{-6}*\left({MMSEBL}_{i}-26\right)\right]+{\theta *\eta }_{1i},$$
(13)
$${CDR}_{i}\left({t}_{j}\right)=18* \left(\frac{{exp}^{{LCDR}_{i}\left({t}_{j}\right)}}{\left(1+ {exp}^{{LCDR}_{i}\left({t}_{j}\right)}\right)}\right),$$
(14)

where LCDRi(tj) is the scaled CDR prediction on a logit scale, CDRi(tj) is the CDR prediction in the original open interval scale (0, 18), r is the typical individual prediction of the rate of CDR–SB progression on the logit scale, LCDR0 is the baseline scaled CDR typical individual prediction on a logit scale, Deff characterizes the pharmacological effect of the drug, Cavg is a measure of exposure, and θ is a variance parameter that allows the rate of progression to have a different IIV magnitude than the baseline logit while constraining the correlation to 1.

The NONMEM parameter estimates and the bootstrap-derived nonparametric 95% CIs for the final model are provided in Table 5. All fixed-effects model parameters were adequately estimated with a relative standard error < 30%. The significance of the included covariates was further supported by the fact that none of the 95% CI for the covariate effects included zero.

Table 5 Parameter estimates of the population exposure–CDR–SB response for the aducanumab final model

The effect of covariates is displayed in a forest plot (Fig. 3). This plot displays ΔΔ at week 78, with uncertainty (expressed as 90% CI) for each identified covariate relative to the reference individual (non-Asian with a baseline MMSE score of 26) for a population with a dose titrated to 10 mg/kg. As shown in Fig. 2, the CIs for all the identified covariates overlap the reference prediction CI and are contained within the clinical equivalence region (indicated by the dotted lines) except for the subjects with lower MMSE-BL, which has a lower bound that is located slightly outside the clinical equivalence region. Therefore, it can be concluded that none of the identified covariates have significant or clinically relevant effects on ΔΔ (with a possible inconclusive result for the Asian race effect since its CI is not completely contained within the equivalence region). This is not surprising because all the covariate effects included in the final model only influence baseline CDR–SB scores and rate of progression and not the drug effect.

Fig. 3
figure 3

Covariate effects on the exposure–CDR–SB model. This figure is representative of an individual in the typical progression mixture whose dose is titrated to 10 mg/kg without any interruption. CDR–SB difference from placebo represents the difference in change from CDR–SB baseline at week 78 between treatment arms and placebo. The reference individual is a White/other race subject with an MMSE score of 26. The vertical solid line represents the null treatment effect. The dotted lines represent reference values ± 0.2 (meaningful signal) in the clinical effect. BL baseline, CDR–SB Clinical Dementia Rating–Sum of Boxes, CI confidence interval, MMSE Mini-Mental State Examination

Diagnostic plots from the final ER model, including GOF plots pooled across studies (Fig. 4) or by studies (Supplementary Fig. 8), indicate lack of any detectable bias. The predictive performance of the final was also evaluated by VPC. CDR–SB time profiles were simulated using the observed dosing patterns from the Phase 3 trials, incorporating sampling uncertainty. VPCs stratified by ITT (placebo, low dose, high dose) (Fig. 5a; Supplementary Fig. 9a) or by dosing groups (3-, 6-, 10-mg/kg titration groups) across studies and mixture classifications (Fig. 5b; Supplementary Fig. 9b) show that the predictions were in good agreement with the observed data, supporting the adequacy of the predictive performance of the final ER model.

Fig. 4
figure 4

Goodness-of-fit plots for the final population exposure–CDR–SB model (run 11). The circles represent individual observations, the red lines represent the unity lines, and the blue lines represent a LOESS smoother. CDR–SB Clinical Dementia Rating–Sum of Boxes, LOESS locally estimated scatterplot smoothing

Fig. 5
figure 5

Visual predictive checks of the final exposure–CDR–SB model. a Predictive checks stratified by intent-to-treat dosing groups, study, and mixture categories. b Predictive checks stratified by dose and study. The solid lines represent the median of the observed data. The shaded areas are the 95% confidence intervals of the corresponding percentiles of the simulated data. CDR–SB Clinical Dementia Rating–Sum of Boxes, prog progressor, Typ typical

Discussion

An ER model was developed characterizing the effect of aducanumab exposure on CDR–SB scores in subjects with AD from two Phase 3 studies: ENGAGE and EMERGE. A model-based approach allowed integration of the results across the two studies, characterization of the placebo effect, and separation of the time course of the placebo effect from that of the drug effect.

A linear model in the logit-scaled CDR–SB best described the time profile of CDR–SB in the placebo group. To account for the heterogeneity in disease progression rates, a mixture model was used that assigned subjects to one of three latent classes: (1) slow progressors (individuals with no progression over the 78-week period), (2) typical progressors, and (3) fast progressors. Additive IIV random effects were assumed for all classes to accommodate both positive and negative progression rates.

The decrease in some CDR–SB scores from baseline in a small subpopulation of patients was a challenging aspect of the model development since it could affect the estimation of disease progression in the majority of patients whose CDR–SB scores increased with time, as was expected. A mixture model facilitated taking these features of the data into account without compromising other aspects of the DP model. Most of the subjects were identified as typical (84%) or fast (12%) progressors by the model, whereas the slow-progressing population was ≈ 4%, indicating that these subjects did not transition to AD status during the placebo-controlled period. This finding was consistent with earlier published literature [10, 12, 13].

Limitations of the logit transformation to characterize bounded data with imputed boundaries has been identified [18]. However, < 1% (46/11088) of the observations were on the boundary of the CDR–SB scale (all 46 at 0 and none at 18). Therefore, selecting arbitrary values of 0.005 and 0.995 for the imputed boundaries for the scaled CDR–SB to permit the use of the logit transformation is not expected to be influential. This was studied further by conducting a sensitivity analysis by selecting a less arbitrary imputation where the scaled values were set to correspond to the midpoint between the boundary and the nearest nonboundary outcome value. That is, for the lower boundary we used the scaled CDR–SB value of 0.5/18 = 0.0278 (midpoint between 0 and 1) and for the upper boundary we used 17.5/18 = 0.9722 (midpoint between 17 and 18). When we refit the final model to this alternative imputation scheme, the parameter estimates (fixed and random effects) changed by < 1% across all the parameters except for rate of progression for the slow progressors, which changed by ≈ 10% indicating that the arbitrariness of the boundary imputation did not unduly influence the estimation of the model parameters.

The DP model fitted to the placebo data was augmented to include the therapeutic activity of aducanumab as an additive effect on the progression rate. The model is able to adequately estimate the drug effect with good precision, as indicated by the drug-effect slope estimate of 2.01 L/(g·year), with a 95% CI (1.56, 2.47) that does not include 0. The drug effect is successfully separated from the disease progression rates, as evidenced by the consistency of the placebo disease progression rate estimates and the disease progression class proportions between the DP and ER models. The ER for aducanumab effects on disease progression was thereby successfully characterized.

The impact of delay between aducanumab exposure and changes in CDR–SB was also investigated, using a latent variable IDR model. However, this model resulted in poor precision in the estimates of some of the parameters. This could be attributed to limited longitudinal data post baseline (only two measurements post baseline) and with announcement of futility; ≈ 55% of the subjects either had only baseline or one additional measurement, never having an opportunity to complete the studies, thereby resulting in poor precision of the IDR model fit.

The apparent study-by-ApoE ε4 interaction of the observed mean CDR–SB CFB (Δ) within the placebo groups was noteworthy (Supplementary Table 7). Using the mixture classification from the final DP model, the placebo arm mean Δ was summarized by study, progression group, and carrier status. From the results presented in Supplementary Table 8, it can be deduced that the apparent qualitative study-by-ApoE ε4 interaction observed earlier is likely due to heterogeneity in AD, since, in comparison within a more homogeneous population of typical progressors, this qualitative interaction was no longer observed.

Covariate analysis identified baseline MMSE score and race on baseline CDR–SB score and an effect of baseline MMSE score on rate of progression. None of these covariates were determined to be clinically relevant (Fig. 3).

Models with differential drug effects by study and progression class were estimated; however, none of them were as parsimonious as the final ER model, with a common drug effect between studies and among progression rate classes.

Investigation of the impact of exposure patterns on outcomes, as presented in Fig. 2, showed better predictive performance for ENGAGE-treated groups than for the placebo model prediction with the drug effect fixed to zero (Fig. 2). These results thereby suggest that the effect of aducanumab in ENGAGE is more like that of EMERGE than of placebo, indicating that exposure patterns do not contribute exclusively to observed differential outcomes between the Phase 3 studies. In totality, the various model runs and analyses provide support that the two studies reflect a common behavior and pharmacology of aducanumab, and that other factors may have contributed to the discordant results between studies.

The mean treatment difference (ΔΔ, defined as the CFB difference from placebo) with aducanumab over time and its uncertainty are presented in Fig. 6. Assuming dosing patterns as observed in ENGAGE and EMERGE, the predicted mean (90% CI) ΔΔ in CDR–SB at week 78 was − 0.208 (− 0.293, − 0.118) and − 0.350 (− 0.483, − 0.221) in the low-dose and high-dose groups, respectively. The corresponding values for the 3-, 6-, and 10-mg/kg aducanumab titration groups were − 0. 157 (− 0.231, − 0.0797), − 0.260 (− 0.369, − 0.148), and − 0.350 (− 0.483, − 0.221), respectively. These simulations illustrate the dose-dependent treatment effects over time and establish the therapeutic dose–response effect of aducanumab.

Fig. 6
figure 6

Model-predicted treatment effect (ΔΔ) over time for the intent-to-treat randomized groups (a) and the dosing groups (3-, 6-, and 10-mg/kg titration groups) (b). The solid lines represent the mean, and the shaded regions represent uncertainty (90% confidence interval). The green dotted line represents the null effect. CDR–SB Clinical Dementia Rating–Sum of Boxes

One of the objectives of the ER analyses presented in this manuscript was to identify whether the drug effects were comparable between the ENGAGE and EMERGE studies after accounting for the heterogeneity in the dosing patterns. From the ER analysis presented in this manuscript, we were able to show that an ER model with a common (pooled) drug effect for both studies is parsimonious among a series of model runs that included differential drug effects by study and disease progression classes (Supplementary Table 5). An external VPC fitting the base ER model to EMERGE and predicting onto ENGAGE suggests that ENGAGE behaves more similarly to EMERGE rather than placebo (Fig. 2).

Despite the modeling efforts presented in this manuscript, a definitive explanation for the discordant findings in the prespecified statistical analyses in which EMERGE demonstrated a statistically significant treatment effect while ENGAGE did not, could not be determined. The above analysis in totality suggests that the pharmacology was consistent across studies, and CDR–SB (primary endpoint) scores were potentially impacted by other unexplained factors beyond the heterogeneity in aducanumab dosing.

Conclusions

Overall, the PK–pharmacodynamics modeling analyses described here adequately characterized the ER of aducanumab in CDR–SB measurements. Modeling of pooled data from the two large Phase 3 trials demonstrated that the intrinsic pharmacology remained consistent between the two Phase 3 studies. Importantly, the investigations necessarily relied on post hoc analyses; thus, modeling results should be interpreted in that light, and not as a replacement for the prespecified, primary study results in which ENGAGE failed to demonstrate a statistically significant treatment effect. Finally, this work signifies the importance of pharmacometric approaches as a complement to conventional biostatistical approaches in providing supportive evidence of the efficacy of aducanumab in subjects with early AD.