Introduction

Exposure–response (E–R) modeling has become more impactful in drug development by allowing optimal dosing selection and potentially providing better understanding of drug’s mechanism of action. Clinical trial endpoints are often disease scores that are either continuous or ordered categorical. For example, in rheumatoid arthritis (RA), two types of commonly used efficacy endpoints are the 28-joint disease activity score using CRP (DAS28) and 20, 50, and 70 % improvement in the American College of Rheumatology disease severity criteria (ACR20, ACR50, and ACR70) [1]. Parsimonious and informative E–R modeling of such endpoints is important in clinical development, especially at late stage in which appropriate decisions rely on accurate predictions that often require the use of models more complex than landmark analyses using direct correlation between exposure metrics and response [2].

A widely used class of E–R models includes the Types I–IV indirect response (IDR) models [3]. For continuous clinical endpoints, these models could be directly applied even though their origin of use was to describe physiological variables, and their presumed consistency with the mechanism of drug action lends confidence to the model predictions. For discrete clinical endpoints, IDR models may be applied by means of the latent variable approach, most notably applied for ACR20, ACR50, and ACR70. Hutmacher et al. [4] initiated the latent variable IDR model for ACR20. Hu et al. [5] extended the latent variable approach to simultaneously model ACR20, ACR50, and ACR70, while making the model more parsimonious [6]. The latter approach allows the placebo effect terms to be estimated separately using only placebo data, which facilitates the ease of initial parameter estimate selection. An equivalent form of its underlying IDR model was also used earlier on other clinical endpoints for psoriasis [7, 8]. Hu [2] substantiated the mechanistic rendition of latent variable IDR models by deriving the latent variable representations for common scenarios that ordered categorical endpoints may be applicable, and summarized related theoretical characteristics and practical implementations.

It is not uncommon that two clinical endpoints are considered of similar importance, such as DAS28 and ACR responses in RA trials, where each has been used as the primary endpoint. In addition, multiple clinical endpoints may measure similar disease status and share the use of some similar information, e.g., both DAS28 (CRP) and ACR20/50/70 use tender and swollen joint counts, as well as CRP. While a panel of clinical endpoints are more informative than a single endpoint in clinical interpretation, they are usually modeled separately in practice [9]. The natural question is whether the same conclusions could be reached by considering these endpoints (1) individually; (2) separately yet together, or (3) jointly. This could be answered to some extent using individual modeling; however the joint modeling approach directly addresses the consistency among the three methods through shared parameters. In addition, joint modeling facilitates the simulation of correlated endpoints, thus the correct evaluation of joint decision criteria based on both endpoints. The correlation between the endpoints should indicate the level of similarity between the disease components measured by the endpoints. Indeed, a strong correlation between DAS28 and ACR response has been shown (e.g., [10]).

More importantly, when multiple clinical endpoints measure similar disease status, as with DAS28 and ACR20/50/70, the joint model framework provides the advantage of allowing the exploration of similar fixed effect and between-subject variability (BSV) parameters, guided by mechanistic rationale. Furthermore, joint modeling allows the remaining unexplained residual correlations to be estimated at the within-subject random effect level. For continuous endpoints, this can be easily handled by assuming a multivariate normal distribution for the residuals. For categorical endpoints, this can be accommodated by the latent variable framework, albeit the implementation is complex [11]. In the case of a continuous and a categorical endpoint, Hu et al. [12] provided a conditional approach to allow for the implementation of the joint model in NONMEM [13], a widely used software application for E–R modeling. Specifically, the residual correlation is modeled at the latent variable level with a bivariate normal distribution, and a negative correlation coefficient indicates that, given the structural and BSV model parameters, the chance of achieving ACR response criteria increases as DAS28 decreases. While estimating the residual correlation may not affect individual endpoint predictions, it improves the prediction of joint probability distribution of the endpoints, more specifically the assessment of the proportion of subjects achieving desirable responses simultaneously in more than one endpoint. This has been well established in statistical literature [11].

The objective of this manuscript was to explore the level of improvement that joint modeling may gain in contrast to separate modeling of a continuous and a categorical endpoint measuring the same disease status. Both approaches were applied to model the DAS28 score and ACR20, ACR50, ACR70 response, using data from two previously published phase III clinical trials for the treatment of RA for which we published a latent variable IDR model of the ACR20, ACR50, ACR70 response previously [6]. The results indicated that improved fit of ACR data could be achieved by leveraging subject-specific random effects whose estimation were made possible only with DAS28 data.

Methods

Data and information used for E–R modeling

The DAS28 score and ACR20, ACR50, ACR70 response were available in data from two phase III, parallel, placebo-controlled clinical trials of intravenously administered mAb X, Study 1 [14] and Study 2 [15], in patients with active RA despite prior use of methotrexate (MTX) therapy. Data used were the same as in the E–R modeling of ACR20, ACR50, ACR70 response described previously [6], with the additional inclusion of DAS28 scores. Briefly, Study 1 investigated the mAb X dose regimen of 2 mg/kg given at Weeks 0, 4, and every 8 weeks thereafter, briefly written hereafter as the q8 weekly regimen. Study 2 studied the mAb X dose regimens of 2 and 4 mg/kg given every 12 weeks. Both trials had MTX as placebo control arms, and subjects on the placebo arms were switched to the active arms of mAb X + MTX at Week 16, at which time they were eligible for early escape. The numbers of subjects in the E–R modeling dataset were 395, 197, 129, 126, and 129 respectively for the following treatment arms: mAb X 2 mg/kg + MTX q8 weeks (Study 1), placebo 1 (MTX, Study 1), mAb X 2 mg/kg + MTX q12 weeks (Study 2), mAb X 4 mg/kg + MTX q12 weeks (Study 2), and placebo 2 (MTX, Study 2).

Both studies had data through Week 48 available for analysis. At least ten serum concentration measurements of mAb X per patient were scheduled over the study period. DAS28, ACR20, ACR50, and ACR70 responses were evaluated at the time of each treatment administration in both studies. Study 1 had an additional evaluation at Week 2.

A population PK analysis using a two-compartment structural model implemented in NONMEM was performed using data from patients available for E–R modeling and additional data from other studies. The model described the data adequately and the details of the PK study data and analysis are described elsewhere. Results were consistent with a previous confirmatory analysis [16]. Empirical Bayesian parameter estimates were then used in a sequential modeling approach for the E–R modeling discussed below.

The final dataset contained 8474 DAS28 score and 7535 ACR response data points from 976 patients. More detailed description on study designs can be found in Hu et al. [6].

E–R model

The E–R model is specified by the DAS28 score and ACR response components along with the residual correlation model.

DAS28 score component

DAS28 scores were modeled by adopting a semi-mechanistic approach applied in earlier E–R analyses [7] as

$$ {\text{DAS28}}\left( {\text{t}} \right) = {\text{b}} - {\text{f}}_{\text{DAS28,p}} \left( {\text{t}} \right) - {\text{f}}_{\text{DAS28,d}} \left( {\text{t}} \right) + {\varepsilon} $$
(1)

where DAS28(t) is the observed DAS28 score, b is baseline, fDAS28,p(t) is placebo effect, fDAS28,d(t) is drug effect, and ε ~ N(0, σ2) represents the within-subject variability. The placebo effect was modeled empirically as

$$ {\text{f}}_{\text{DAS28,p}} \left( {\text{t}} \right) = {\text{b F}}_{\text{p,DAS28}} \left[ {1 - \exp \left( { - {\text{r}}_{\text{DAS28}} {\text{t}}} \right)} \right] $$
(2)

where 0 ≤ Fp,DAS28 ≤ 1 is the fraction of maximum placebo effect and rDAS28 is the rate of onset. The drug effect was modeled with

$$ {\text{f}}_{\text{DAS28,d}} \left( {\text{t}} \right) = {\text{b}}\left( {1 - {\text{F}}_{\text{p,DAS28}} } \right){\text{E}}_{\hbox{max} } \left[ {1 - {\text{R}}_{\text{DAS28}} \left( {\text{t}} \right)} \right] $$
(3)

where 0 ≤ Emax ≤ 1 represents fraction of maximum drug effect and, following a previous approach [7, 8], the drug effect was assumed to be driven by a latent variable RDAS28(t) governed by:

$$ {\frac{{d\mathop {{\text{R}}_{\text{DAS28}} ( {\text{t)}}}\nolimits_{{}} }}{dt} = \mathop k\nolimits_{{in,{\text{DAS28}}}} \left( {1 - \frac{{\mathop {\mathop {\text{C}}\nolimits_{\text{p}} }\nolimits^{{}} }}{{\mathop {IC}\nolimits_{{50,{\text{DAS28}}}}^{{}} + \mathop {\mathop {\text{C}}\nolimits_{\text{p}} }\nolimits^{{}} }}} \right) - \mathop k\nolimits_{\text{out,DAS28}} \mathop {{\text{R}}_{\text{DAS28}} ( {\text{t)}}}\nolimits_{{}} } $$
(4)

It was further assumed that at baseline RDAS28(0) = 1, yielding kin, DAS28 = kout, DAS28.

The BSV on the parameters constrained to be between 0 and 1, i.e., Emax and Fp,DAS28, were modeled with normal distributions after a logit transformation. The BSV on other parameters were modeled with lognormal distributions. Correlation between BSV was assessed with the corresponding multivariate normal distributions..

ACR response component

As in previously implementations [6], the three endpoints ACR20, ACR50, and ACR70 were combined into one ordered categorical endpoint ACR having 4 possible outcomes: ACR = 1, if achieving ACR70; ACR = 2, if achieving ACR50 but not ACR70; ACR = 3, if achieving ACR20 but not ACR50; and ACR = 4, if not achieving ACR20. To facilitate residual correlation modeling [12], the following mixed-effect probit regression was used instead:

$$ \Phi^{ - 1} \left[ {{\text{prob}}\left( {{\text{ACR}} \le {\text{k}}} \right)} \right] = {{\upalpha }}_{\text{k}} + {\text{f}}_{\text{p,ACR}} \left( {\text{t}} \right) + {\text{f}}_{\text{d,ACR}} \left( {\text{t}} \right) + {{\eta }} $$
(5)

where Φ is the normal cumulative distribution function, k = 1, 2, or 3, αk are monotonically increasing in k, and η ~ N(0, ω2) represents inter-subject variability. As in previous applications [2, 6, 12], αk were re-parameterized as (α2, d1, d3) with d1, d3 > 0 such that α1 = α2 − d1 and α3 = α2 + d3, in order to stabilize parameter estimation. The placebo effect was modeled with a previously used exponential function [4, 9]:

$$ {\text{f}}_{\text{p,ACR}} \left( {\text{t}} \right) = - {\text{F}}_{\text{p,ACR}} \exp \left( { - {\text{r}}_{\text{p,ACR}} {\text{t}}} \right) $$
(6)

In the latent variable level representation, this has the same form as Eq. 2 as shown by Hu [2]. It is more flexible than that used in Hu et al. [6] and was chosen here to allow exploration of similarities between the endpoints. For the same reason, it is assumed that the drug effect is driven by a latent variable RACR(t), governed by the same form as Eq. 4;

$$ {\frac{{d\mathop {{\text{R}}_{\text{ACR}} ( {\text{t)}}}\nolimits_{{}} }}{dt} = \mathop k\nolimits_{in,ACR} \left( {1 - \frac{{\mathop {\mathop {\text{C}}\nolimits_{\text{p}} }\nolimits^{{}} }}{{\mathop {IC}\nolimits_{50,ACR}^{{}} + \mathop {\mathop {\text{C}}\nolimits_{\text{p}} }\nolimits^{{}} }}} \right) - \mathop k\nolimits_{\text{out,ACR}} \mathop {{\text{R}}_{\text{ACR}} ( {\text{t)}}}\nolimits_{{}} } $$
(7)

where Cp is drug concentration, and kin,ACR, IC50,ACR, and kout,ACR are parameters in a Type I IDR model. It was further assumed that at baseline RACR(0) = 1, yielding kin,ACR = kout,ACR. The reduction of RACR(t) was assumed to drive the drug effect through:

$$ {\text{f}}_{\text{d}} \left( {\text{t}} \right) = {\text{f}}_{\text{d,ACR}} \left( {\text{t}} \right) = {\text{DE}}_{\text{ACR}} \left[ {1 - {\text{R}}_{\text{ACR}} \left( {\text{t}} \right)} \right] $$
(8)

where DEACR is a parameter to be estimated that determines the magnitude of drug effect. We have previously shown that this representation of drug effect is equivalent to that of a change-from-baseline [6], and that DEACR may be interpreted as the baseline of the latent variable [2]. The mechanistic interpretation of the latent variable representation of Eqs. 58 is the same as that of Eqs. 1–4 [2], which allows for easy exploration of shared parameters between the DAS28 and ACR models.

DAS28-ACR joint model and residual correlation

Equations 18 were first fitted to DAS28 and ACR data separately, and then simultaneously with shared parameters explored. The joint model allows the potential residual correlation between ACR and DAS28 responses to be modeled using a bivariate normal distribution of ε = (εDAS28, εACR) with a correlation parameter ρ. Further details on the joint model likelihood specification and NONMEM implementation code were given in Hu et al. [12].

Model estimation and evaluation

The sequential PK/PD modeling approach was used by first fixing the previously obtained individual empirical Bayesian estimates of PK parameters. PK/PD model parameter estimation was implemented in NONMEM, using the LAPLACE method for early exploration and the importance sampling (IMP) method for key model runs. The NONMEM objective function value (OFV) approximates −2 times loglikelihood. A change in NONMEM objective function value of 10.83, corresponding to a nominal p value of 0.001, was used as a criterion of including an additional parameter. Visual predictive check (VPC) was used for model evaluation by simulating 500 replicates of the dataset and comparing simulated and model-predicted DAS28 score and ACR response frequencies over the treatment period.

Results

Figure 1 shows the observed DAS28 time course by treatment group. High variability was apparent, both between and within subjects. In addition, baseline DAS28 in Study 1 appeared to be notably larger than that in Study 2.

Fig. 1
figure 1

Observed 28-joint disease activity (DAS28) scores by treatment group overlaid with smoothing spline

Initial DAS28 model

Equations 14 were fitted to the DAS28 data. Early modeling explorations using the Laplace option could not reliably estimate any more BSV terms than those on b and IC50, and led to a sizable estimate (≅5, or >70 %) for BSV on IC50. While parameter estimate appeared reasonable, standard error estimation appeared unstable. Since the adequacy of the Laplace approximation degrades as the magnitude of BSV increases, the IMP estimation option in NOMEM was used for key model runs. An additional parameter bs was used to account for the baseline difference between Study 1 and Study 2, and BSV terms were included on b, pDAS28, IC50,DAS28 and Emax, with a full variance–covariance matrix accounting for their correlations. Attempting to reduce the BSV terms or the correlation parameters or to include additional BSV terms resulted in either notably worsening or lack of sufficient improvement in the fit. Table 1 shows the parameter estimates. Estimation precision was reasonable, with standard errors (SE) generally a magnitude lower than the estimates. SE is presented instead of the often used relative standard errors in order to provide appropriate comparison of estimation precision among different models. Figure 2 shows the VPC results, where high variability of the observed data is consistent with Fig. 1. Overall, the model reasonably described the observed data trends.

Table 1 DAS28 exposure–response model parameter estimates
Fig. 2
figure 2

Visual predictive check of the 28-joint disease activity (DAS28) score for the initial model. The 5th, 50th and 95th percentiles of observed DAS28 scores are overlaid with the 90 % prediction intervals (PI) of their model predictions at planned observation times by treatment. PBO placebo

Initial ACR model

The model described by Eqs. 58 was applied to the ACR response data. Table 2 shows the parameter estimates. Estimation precision was reasonable. IC50 and Kout estimates were similar to those obtained previously [6]. From a theoretical perspective, DE estimate from logistic regression could be expected to be larger than from probit regression by approximately a factor of 1.8 (≅\( \sqrt {\pi^{2} /3} \)), due to the fact that variances for the standard logistic and the standard normal distributions are \( \pi^{2} /3 \) and 1 respectively. Taking account of this difference, differences between the DE estimates did not appear to be unexpected. Other parameter estimates were different in part due to the earlier use of a reduced placebo model. Estimation precision was reasonable, with SEs at generally a magnitude lower than the estimates. VPC results are shown in Fig. 3, which are similar to those in the previous analyses [6] as could be expected [2]. As previously noted, the high observed placebo responses at Weeks 20 and 24 may be due to early escape [6].

Table 2 ACR exposure–response model parameter estimates
Fig. 3
figure 3

Median model predictions at planned observation times and 90 % prediction intervals (PI), in overlay with observed American College Rheumatology (ACR) response frequencies, for the initial ACR model. ACR20/50/70, 20 %/50 %/70 % improvement in the American College of Rheumatology criteria

Joint E–R model

The base scenario of fitting Eqs. 18 simultaneously with no shared parameters between the DAS28 and ACR model component is equivalent to estimating the DAS28 and ACR models separately. Indeed, the sum of the OFVs of the DAS model and ACR models was nearly identical to the OFV of the simultaneously fitted model with no shared parameters. From Tables 1 and 2, the rate parameters, namely rDAS28 and kout, appeared similar between the endpoints, along with IC50. Indeed, this was substantiated by the joint model evaluations showing insignificant OFV change for sharing each parameter in the joint model.

This result motivated the question of whether further similarities between the endpoints could be found. It could be hypothesized that, based on binding, a single latent variable could govern both endpoints through IDR models. The similarity of the Kout parameters suggested the possibility of using a single IDR model instead of two. If so, the latent variable may not be separately identifiable from DAS28. The question however is in what sense the placebo and drug effect could be considered similar for both endpoints, particularly because DAS28 and ACR endpoints have different scales. This rationale motivated the use of DAS28 change-from-baseline as ACR endpoints are defined as change-from-baseline and the latent variable is defined only up to a constant [2, 4]. Following the notation and motivation provided by Hu [2], this led to the following ACR model in place of Eqs. 58:

$$ \Phi^{ - 1} \left[ {{\text{prob}}\left( {{\text{ACR}} \, \le \, {\text{k}}} \right)} \right] = \, \upalpha_{k} {-} \; {\text{M}}\left( t \right) \, + \upeta $$
(9)

where M(t) = Lm[fDAS28,p(t) + fDAS28,d(t)]/b is the scaled change-from-baseline DAS28 score model prediction given in Eq. 1, with BSV terms as given in the initial DAS28 score model.

Fitting Eqs. 14 and 9 simultaneously to the DAS28 and ACR response data resulted in a NONMEM objective function decrease of over 2000, indicating a significant improvement of the fit, despite using four fewer parameters than the base scenario. Including the residual correlation between DAS28 and ACR responses further reduced the NONMEM objective function by over 1900. This was considered as the final model, with parameter estimates of the DAS28 and ACR response components given in Tables 1 and 2, respectively.

Table 1 shows that for DAS28, the joint model parameter estimates and associated SEs were generally similar to those obtained with the initial model using only DAS28 data. Table 2 shows that for ACR, the joint model used no additional parameters for the placebo and drug effects other than the scaling parameter Lm. Estimates of the intercept parameter α2 between the initial and joint models are not directly comparable, as the average intercept value is determined up to a constant with the latent variable [2]. Estimates of d1 and d3, the intercept differences, were similar between the initial and joint models. The estimate of ω2 was smaller in the joint model, due to the fact that the treatment effect predictor M(t) in Eq. 9 contains BSV components whereas Eqs. 68 do not. This contribution of BSV components of the DAS28 model is the main explanation for the improved fit of the joint model in light of the NONMEM objective function decrease of over 1900 mentioned above. Coupled with a high absolute value of correlation parameter estimate (0.655) shown in Table 1, this confirms that the two endpoints measure the same component of the disease.

VPC results of the joint model for DAS28 was visually indistinguishable from Fig. 2, and thus is not shown. This is consistent with the similarity between the parameter estimates of the joint and the separate models. VPC results of the joint model for the ACR response are shown in Fig. 4. It is noted that the VPC results of the joint model before incorporating the residual correlation component were visually indistinguishable from Fig. 4 and are not shown. The results appeared largely similar to Fig. 3; where differences occurred, it may appear difficult to determine whether the joint model or the individual ACR model prediction better represent reality. However, it is reasonable to expect that the ACR70 response rates for the placebo arms to be small in early treatment periods, and the joint model prediction in Fig. 4 is better than the separate ACR model prediction in Fig. 3. This may be attributed to the more reasonable partitioning of BSV onto additional model parameters under the joint model instead of only at the intercept level under the separate ACR model. It can be seen that under Eq. 5, the expected ACR response rate at time of (or near) 0 is given by [2, 17]

Fig. 4
figure 4

Median model predictions at planned observation times and 90 % prediction intervals (PI), in overlay with observed American College Rheumatology (ACR) response frequencies, for the final joint model. ACR20/50/70, 20 %/50 %/70 % improvement in the American College of Rheumatology criteria

$$ {\text{E}}\left\{ {{\text{prob}}\left[ {{\text{ACR}} \, \le \, {\text{k}}} \right]} \right\} \, = \Phi (\upalpha_{{\text{k}}} /\sqrt {1 +\upomega^{2} } ) $$
(10)

and that the larger predicted ACR70 response rate in the separate model was caused by the larger ω2 estimate. In contrast, the joint model had part of this BSV component partitioned onto other parameters, namely through BSV terms on b, pDAS28, IC50,DAS28 and Emax, none from which contributed to the BSV of ACR responses at time near 0 since M(0) = 0, which led to the smaller overall BSV predictions near time 0. This therefore suggested that the latent variable joint model may partition the BSV more appropriately by allowing it to vary over time, as opposed to remaining constant under the separate model.

In order to understand the source of improvement in the joint model fitting, it may be tempting to examine the OFV changes in the DAS component and ACR component. However this would be exceedingly difficult, because the two components are not independent without conditioning on the BSV terms. Nevertheless there are two relevant observations in comparing the separate and the joint models: (1) in the DAS component, changes of parameter estimates and associated SE were minor, and the VPC results were virtually unchanged; (2) in the ACR component, the original BSV variance ω2 was markedly reduced in the joint model. These suggested that the improvement in OFV mostly came from the ACR component, through the presence of the additional BSV terms in the latent variable.

Discussion

Our analysis supported and utilized the relatedness between DAS28 and ACR responses that are designed to measure the same disease component through shared structural model component along with the related BSVs and the residual correlation. A common view on the modeling of multiple endpoint is that, while joint modeling may improve the overall fit as measured by the likelihood or predictions of correlated responses, it would not affect the descriptions or predictions of the individual endpoints [11]. On the other hand, the joint model for DAS28 and ACR responses developed here improved the characterization of the ACR response, in a manner unrelated to the residual correlation component. This is due to the fact that the number of random effects used to describe ACR response actually increased by three under the joint model, namely through the BSV terms under the latent variable M(t) in Eq. 9. The estimation of these additional random effects could not be reliably supported by ACR data alone [6], and was made possible only with the added DAS28 data under the joint model framework. This feature may hold more generally when the endpoints include both the continuous and the categorical types, where the continuous endpoint data may support random effects not estimable from the categorical endpoint data alone, and thus resulting in better description of the categorical endpoints. This demonstrates more clearly the improvement achievable by the joint modeling approach than the previous applications [11, 12]. It is noted that the joint approach requires considerably more effort and computational time. On the other hand, categorical endpoint modeling alone typically cannot support the estimation of more than one BSV term. It is noted that the magnitude of BSV at the intercept level determines the response probability together with underlying exposure, as can be seen in Eq. 10 (see more details in Hutmacher and French [17]). The question thus arises on how this would affect model predictions, especially the associated variability. The joint model can provide valuable insights into this important question in clinical development.

A study effect term was used to account for baseline DAS28 differences between the two studies. Estimation results of the remaining parameters were similar to those in an earlier model without the study effect on baseline. However, the VPC of this earlier model showed systematic differences between the model predicted and observed data trends, which could easily lead to doubt regarding the predictive ability of the model. The study effect has descriptive but not predictive value, and the inclusion of such covariates should be exercised with caution. As subjects are randomized only within the studies and not to them, the study effect on baseline served the purpose of adjusting for imbalances between studies. Inclusion of study effect on other model components could easily lead to interpretation difficulties and is thus generally not advisable.

An often encountered practical difficulty with IDR modeling is to explain the nature of the underlying E–R relationship to unfamiliar audiences. It is especially difficult to illustrate how it influences the time course of the response, since the E–R relationship can only be plotted under a theoretical steady-state infusion, which may be of little help to understand the actual response in clinical applications where dosing periods are often at their half-lives or longer. However, IDR models have been used increasingly in E–R modeling of clinical endpoints and have certain advantages compared with other approaches such as the Markov transition approach or direct correlations between the primary efficacy endpoint and simple exposure metrics using area under drug concentration–time curve (AUC) or trough drug concentration (Ctrough) [2]. In practical applications, placebo effects have been modeled in different forms under the IDR models and may be broadly classified as 1-pathway and 2-pathway approaches [2, 18]. In part due to the lack of mechanistic understanding, practical choices were perhaps often made according to the relative ease of implementation in the particular application, and may differ even when multiple endpoints were obtained from the same trials [9]. Maintaining consistency of mechanism interpretations of the model forms for the different endpoints is important to allow joint modeling to fully achieve its benefits.