Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Handling missing data in longitudinal clinical trials has gained considerable interest in recent years among academic, industry, and regulatory statisticians alike. Although substantial research has been devoted to statistical methods for missing data , no single analytic approach has been accepted as universally optimal. A common recommendation is to simply conduct sensitivity analyses under different assumptions to assess the robustness of the analysis results from a clinical trial . For decades, the gold standard for longitudinal clinical trials has been using mixed-models for repeated measures (MMRM, see e.g., Mallinckrodt et al. 2008). However, the MMRM analysis requires the assumption that all missing data are missing at random (MAR) , an assumption which cannot be verified, and might even be considered as unlikely for some study designs and populations. The current expectation is that regulatory agencies will require sensitivity analyses to be conducted to evaluate the robustness of the analytic results to different missing data assumptions (European Medicines Agency 2010; National Academy of Sciences 2010). Clinical trial statisticians are thus well-advised to understand the ramifications that various missing data mechanisms (MDMs, see e.g., Little and Rubin 1987) have on their proposed analyses, most notably on bias, type I error control, and power. To that end, Monte-Carlo simulations are often used to conduct trial simulations under different MDMs for evaluating statistical analysis models.

In this chapter, we present three simulation-based approaches related to missing data issues in longitudinal clinical trials. First, a simulation-based approach is developed for generating monotone missing multivariate normal (MVN) data under a variety of MDMs (i.e., missing-completely-at-random [MCAR], MAR , and missing-not-at-random [MNAR]), which allows users to specify the expected proportion of missing data at each longitudinal time point. Second, a simulation-based approach is used to implement a recently proposed “tipping-point” sensitivity analysis method. Specifically, a delta-adjustment is applied to account for the potential difference in the estimated treatment effects between the mixed model (typically used as the primary model in clinical trials) and the multiple imputation model used to facilitate the tipping point analysis . Last, a Bayesian Markov chain Monte-Carlo (MCMC) method for control-based imputation is considered that provides a more appropriate variance estimate than conventional multiple imputation. Computation programs for some of these methods are made available in SAS.

In practice, there are two types of missing data . The first type is intermittent missing where the missing data of a subject is followed by at least one timepoint at which data are observed. The second type is monotone missing, which is typically caused by study attrition (i.e., early drop out of a subject). Common reasons for intermittent missing data include missed subject-visits, data collection errors, or data processing (e.g., laboratory) errors. Because these errors are unlikely to be related to the value of the data itself (had that value been observed), an assumption that the intermittent missing data are MAR , or even MCAR , is often appropriate. Therefore, it may be considered reasonable to first impute the intermittent missing data under MAR before performing the analysis (e.g., Chap. 4, O’Kelly and Ratitch 2014). With this consideration, the discussion that follows focuses primarily on monotone missing data. Although caution should be exercised because intermittent data might be MNAR for some studies in which the disease condition is expected to fluctuate over time.

2 Generation of Study Data with a Specified MDM and Cumulative Drop-Out Rates

Whether the clinical trial statistician wants to investigate novel approaches in the analysis of missing data or simply wants to compute power for an upcoming study, it is often useful to generate MVN data (given mean \(\upmu \) and covariance matrix \(\Sigma )\) under a specific MDM , with specified expected cumulative drop-out rates at each longitudinal timepoint. This section presents a method for generating monotone missing data , with a simple process outlining how to add intermittent (i.e., nonmonotone) missing data provided at the end of this section.

It is of interest to generate longitudinal MVN data, given \(\upmu \) and \(\Sigma \), with expected cumulative drop-out rates (CDRs) over time under a given monotone MDM . For the MCAR MDM, this step is easily accomplished by first generating a subject-specific U(0,1) random-variate, and then comparing that variate to the target CDR at each timepoint. Starting with the first postdose timepoint, if the random variate is smaller than the target CDR, then that subject can be considered as having dropped out, with the data at that timepoint and all subsequent timepoints set to missing.

The approach for the MAR and MNAR MDMs is more complicated because the missingness for these MDMs depends on the data itself. Analytic or closed-form solutions are not yet available for a general MDM specification. As opposed to the unconditional approach used for the MCAR MDM, the subjects who have already dropped out need to be accounted for. Specifically, the conditional probability needs to be calculated for each individual who will drop out at Time t given that the subject is still in the study at Time t–1. Defining CDR\(_\mathrm{t}\), t \(=1\) to T, as the desired postdose expected cumulative droprate at Time t, this conditional proabability is expressed as (CDR\(_\mathrm{t}\)-CDR\(_\mathrm{t -1})\)/(1-CDR\(_\mathrm{t -1})\). For the purposes of this chapter, the baseline time point is assumed to be nonmissing (i.e., CDR\(_{0}=0\)).

Let Y\(_\mathrm{t jk} \sim \) N(\(\upmu _\mathrm{k}\), \(\Sigma )\), with t \(=0\) to T, j \(=1\) to n, and k \(=1\) to K, for T total timepoints, n total observations, and K total groups (e.g., treatment arms). As noted, the baseline measurement, Y\(_\mathrm{0jk}\), is assumed to be nonmissing. Let p\(_\mathrm{t j}\) represent the estimated conditional probability of dropping out at postdose time t for subject j, conditioned on subject j not having already dropped out. Finally, let \(\Psi \) represent one or more tuning parameters governing the effect of Y values on p\(_\mathrm{t j}\), with the designation of a positive (negative) value of \(\Psi \) indicating that higher (lower) values of Y are more likely to result in drop out.

Specifically, a logistic model, logit(p\(_\mathrm{t j})= \) f(Y\(_\mathrm{j}\), \(\varvec{\alpha }\), \(\Psi )\) is considered, with the tuning parameter(s) \(\Psi \) pre-specified by users based on the desired MDM. The vector \({\varvec{\alpha }} = (\alpha _{1}, {\ldots }, \alpha _\mathrm{t})\) is then estimated based on Monte-Carlo simulations such that the resulting missing data are sufficiently close to the specified CDRs (per the user-defined tolerance parameter \(\epsilon \)). Without loss of generality, consider the following MDM , which follows a simple MAR process, where missingness at a given timepoint is solely a function of the observation at the previous timepoint (conditioned on the subject having not already dropped out). To simplify notation, the subject indicator j for \(p_{t j}\,\mathrm{and}\,y_{t j}\) is supressed in the following formulas:

$$\begin{aligned} \mathrm{logit(p}_\mathrm{t})={\sum }_{i=1}^t \alpha _i +\uppsi \mathrm{y}_\mathrm{t-1}, \quad \mathrm{t}= 1, 2, {\ldots }, \mathrm{T}. \end{aligned}$$
(1)

The \(\alpha _\mathrm{i}\) are solved in a stepwise manner, first solving for \(\alpha _{1}\) using logit(p\(_{1})= {\alpha }_{1}+\uppsi \, \mathrm{y}_{0}\), such that \(p_1 \) is sufficiently close to \(CDR_1 \).

Next, solve for \(\alpha _{2}\) using logit(p\(_{2})=\hat{\alpha }_{1}+\alpha _{2}+\psi \) y\(_{1}\), such that \(p_2 \) is sufficiently close to \((CDR_2 -CDR_1)/(1-CDR_1 )\), where \(\hat{\alpha }_1 \)was estimated from the previous step. Prior to iteratively solving for \(\alpha _{2}\), it is required to identify and exclude data from those subjects in the simulated dataset that have dropped out. This step is accomplished by comparing a subject-specific (and timepoint-specific) U(0, 1) variate with the subject-specific value of p\(_{1}\) (which is, in part, a function of the recently solved \(\hat{\alpha } _{1})\). This process is continued through to time T.

Each \(\upalpha _\mathrm{t}\) is solved using a bisectional approach in conjunction with a large random sample drawn from the specified MVN distribution, with convergence for \(\alpha _\mathrm{t}\) declared when

$${\vert }\hat{p}_\mathrm{t}-\mathrm{(CDR_\mathrm{t}-CDR_\mathrm{t -1})/(1-CDR_\mathrm{t -1})}{\vert } <\epsilon ,$$

where \(\epsilon \) is a user-defined convergence criterion and \(\hat{p}_\mathrm{t}\) is a function of Y, \(\Psi \), and \(\hat{\alpha }_\mathrm{t}\). The detailed steps for calculating these \(\hat{\alpha }_\mathrm{t}\) are provided in the section that follows.

General Algorithm to Solve for \(\,{\varvec{\alpha }}\)

A bisectional approach is used to solve for each \(\varvec{\alpha }\) \(_\mathrm{t}\) sequentially as follows:

  1. 0.

    Generate a large dataset of observations (e.g., 100000), Y, comprised of Y\(_\mathrm{k} \sim \) N(\(\varvec{\upmu }\) \(_\mathrm{k}\), \(\varvec{\Sigma }\)), with the proportion of observations following the distribution of Y\(_\mathrm{k}\) equal to \(\pi _\mathrm{k}\), as determined by the treatment ratio per the study design.

Do Steps 1–9 for t \(=1\), ..., T:

  1. 1.

    Initialize \(\alpha _\mathrm{L}= -10000, \alpha _\mathrm{C} = 0, \alpha _\mathrm{U} = 10000\), DONE \(=0\), COUNTER \(=0\)

  2. 2.

    If t \(\ge \) 2, then simulate the missingness of observations at earlier timepoints by using the previously computed \(\alpha \). Delete any subjects who are simulated as having dropped out.

  3. 3.

    For each remaining observation in Y, compute an estimate of \(\widehat{f(\alpha _{t j})}\,(=\) f(Y, \({\varvec{\alpha }}\), \(\Psi ))\), which is a function of the previously estimated \(\alpha \), \(\alpha _\mathrm{C}\), and some function of the y and \(\Psi \) (depending on the MDM model).

  4. 4.

    Compute \(\hat{p}_\mathrm{t j} = (1\, + \) exp(\(-\widehat{f(\alpha _{t j} )})^{-1}\) for each remaining observation in Y.

  5. 5.

    Compute \(\hat{p_t }\) as the mean of the \(\hat{p}_\mathrm{t j}\).

  6. 6.

    Compute DIFF \(= \widehat{p_t } - \)(CDR\(_\mathrm{t}-\) CDR\(_\mathrm{t -1})\)/(1 − CDR\(_\mathrm{t -1})\), a measure of how accurate this guess at \(\alpha _\mathrm{t}({=}\alpha _\mathrm{C})\) is.

  7. 7.

    (a) If \({\vert }\mathrm{DIFF}{\vert } < \epsilon \) then DONE \(=1\) (We are satisfied with \(\alpha _\mathrm{C})\).

    (b) else if DIFF > 0 then \(\alpha _\mathrm{U}=\alpha _\mathrm{C}\) and \(\alpha _\mathrm{C} = (\alpha _\mathrm{L}+\alpha _\mathrm{C})\) / 2 (i.e., search lower).

    (c) else if DIFF < 0 then \(\alpha _\mathrm{L}=\alpha _\mathrm{C}\) and \(\alpha _\mathrm{C} = (\alpha _\mathrm{U}+\alpha _\mathrm{C})\) / 2 (i.e., search higher).

  8. 8.

    COUNTER \(=\) COUNTER\(+1\); If COUNTER \(=50\) then DONE \(=1\). (the value of COUNTER may be adjusted by users to avoid an endless loop due to nonconvergence, though COUNTER \(=50\) is likely sufficient.)

  9. 9.

    If Not DONE then Go to Step 3; if DONE, set \(\hat{\alpha }_\mathrm{t}=\alpha _\mathrm{C}\)

After the \(\varvec{\alpha }\) vector has been estimated, the actual missing data of interest can be simulated by randomly generating the complete MVN dataset (Y), followed by the determination of missingness. Based on \(\varvec{\alpha }\), \(\Psi \), the MDM model, and the randomly generated complete dataset Y, subject-specific cutpoints \(\hat{p}_\mathrm{t j}\) are computed. These cutpoints represent the probability that Subject j will drop out of the study at timepoint t (conditioned on not already having dropped out). For each timepoint, a uniform variate is generated and compared to the appropriate cutpoint to determine whether the subject drops out at that timepoint. The process starts at the first postdose timepoint, and proceeds sequentially up to last time point T. As noted above, this process is actually required in the stepwise generation of the \(\alpha \) values themselves (Step 2 in the algorithm above).

We note that the proposed algorithm is set up to handle a mixed distribution, with the CDRs at each timepoint defined over all treatments arms. Of course, the different treatment arms will presumably still have different CDRs as a function of the \(\upmu _\mathrm{k}\). If different defined CDRs are desired for each treatment arm (as opposed to defining the CDRs over all treatment arms and letting the \(\upmu _\mathrm{k}\) provide differentiation between the treatment droprates), then the algorithm would need to be run once for each such arm (or group of arms), yielding a different \(\alpha \) vector for each.

In the absence of historical treatment-specific drop-out information, one could take a two-step approach in specifying the CDRs. The first step involves running the algorithm assuming a CDR (perhaps corresponding to placebo data from literature) over all treatment arms and letting the assumed efficacy (i.e., \(\upmu _\mathrm{k})\) drive the treatment-specific drop-out rates. One could then use available safety information on the drug (and placebo) to fine tune those treatment-specific CDRs, thus generating a separate \(\alpha \) vector for each treatment group.

The process described above can also be used to generate study data with MNAR MDM. For example, the model logit(p\(_\mathrm{t})= {\sum }_{i=1}^t \alpha _i +\uppsi \) y\(_\mathrm{t -1}\) can be replaced with two options: (a) logit(p\(_\mathrm{t})= {\sum }_{i=1}^t \alpha _i +\uppsi \) y\(_\mathrm{t,}\) in which the missing probability depends on the missing data; or (b) logit(p\(_\mathrm{t})= {\sum }_{i=1}^t \alpha _i +\uppsi _{1 }\)y\(_\mathrm{t -1 }+\uppsi _{2 }\) y\(_\mathrm{t ,}\) in which p\(_\mathrm{t}\) depends on both observed y\(_\mathrm{t -1 }\)and the missing data y\(_\mathrm{t}\). In all these models, the tuning parameters \(\uppsi \), \(\uppsi _{1}\), and \(\psi _{2}\) are prespecified by users.

One might also wish to add intermittent (i.e., nonmonotone) missing data . This can be accomplished by generating a vector (one for each postdose timepoint) of independent uniform variates for each subject and then comparing that vector to cutpoints (timepoint-specific, as desired) corresponding to the missing probability at each timepoint (e.g., 0.01). Presumably, this process would be applied in conjunction with the monotone process, with the monotone process meant to emulate missingness due to subject-drop out and the intermittent process meant to emulate an MCAR process. In such a case, the process used to generate the intermittent missing data is applied independently of the process used to generate monotone missing data, with a given observation designated as missing if either process determines that observation is missing. The overall probability of missing would clearly be higher than the level specified for either process on its own, thus requiring some downward adjustment of the defined drop-out probabilities of one or both processes.

Example: Power and Bias Evaluation for a Longitudinal Study with Missing Data

For sample size and power calculations, analytic approaches are available when missing data are MAR (e.g., Lu et al. 2008, 2009). In general, power loss stemming from missing data in a longitudinal trial depends on the proportion and timing of the missing data; that is, the cumulative drop-out rates (CDRs), as a function of the different effective information yielded from the observations over time. For example, one would expect that a study with drop outs occurring gradually over time would have less power than a study in which all of the drop outs occurred between the second-to-last and the last (presumably primary) timepoint.

Despite the available analytic approaches, power calculations for longitudinal clinical trials are often conducted via simulations, given the extreme flexibility that simulations afford. The use of simulations is especially common in trials with complicating factors such as (a) interim analyses for futility or for overwhelming efficacy, (b) multiplicity approaches covering multiple timepoints/endpoints, or (c) adaptations built into the designs (e.g., dropping an arm or adjusting the randomization ratio as a function of the accruing data). Of course, power calculations can also be simulated for relatively straightforward clinical trial designs.

The following simulation study investigates the effect that different methods of generating random MVN data, primarily with respect to MDMs, have on both the bias of the parameter estimates and the corresponding power calculations. The assumed parameters are based on data obtained from an actual clinical trial. The results from 10,000 simulation runs are summarized in Table 1.

The simulations used the following assumptions (four postdose timepoints):

\(\alpha =0.050\);

N/per arm \(=120\).

\(\upmu _\mathrm{Pbo} = (0.00, 0.46, 0.92, 1.37, 1.83); \upmu _\mathrm{Act} = (0.00, 0.23, 0.46, 0.69, 0.92)\); (Higher means represent lower efficacy).

A functional form was used for the variance-covariance matrix, with \(\upsigma _\mathrm{i}=\) c\(^\mathrm{i}\) * \(\upsigma _{0}\), with \(\upsigma _{0}= 0.860\), and c \(=1.288\); i \(=1,{\ldots }, 4\); and \(\uprho _\mathrm{ij} = \) r*b\(^\mathrm{|j-i|-1}\), with r \(=0.748\) and b \(=0.832\); i, j \(=0,{\ldots }, 4\).

Notably \(\sigma _{0}^{2}= 0.740, \sigma _{4}^{2}= 5.616\), and \(\rho _{0,4}= 0.431\), with Var(\(Y_4 -Y_0)=4.6\).

CDR \(=\) (0.086, 0.165, 0.236, 0.300), with a 30 % CDR at Timepoint 4 (T4).

The following missing data patterns (MDPs) were considered (with \(\Psi =\Psi _{1}=\Psi _{2}=0.5\)).

  • MDP0: No missing data.

  • MDP1: MCAR

  • MDP2: Data are MCAR but only the baseline and last timepoint values are included in the analysis (Completers Analysis).

  • MDP3: MAR with logit(p\(_\mathrm{t})= {\sum }_{i=1}^t \alpha _i +\psi \) y\(_\mathrm{t -1}\)

  • MDP4: MNAR with logit(p\(_\mathrm{t})= {\sum }_{i=1}^t \alpha _i +\uppsi \) y\(_\mathrm{t}\)

  • MDP5: Mixture of MAR and MNAR with logit(p\(_\mathrm{t})= {\sum }_{i=1}^t \alpha _i +\uppsi _{1}\)y\(_\mathrm{t -1}+\uppsi _{2 }\) y\(_\mathrm{t}\)

Table 1 Summary of missing data, power, and estimated treatment effects from MMRM at time 4 (10,000 simulations)

Although the simulations were conducted by defining a 30 % drop-out rate over the two treatment arms, readers should note that the two treatment arms still have different drop-out rates for the MAR and MNAR MDMs. This difference is a result of higher efficacy over time in the drug test group as compared with the placebo group. In the data simulations, we use \(\Psi =\Psi _{1}=\Psi _{2}=0.5\) such that a higher value (worse) of observed (in MDP3) or unobserved (in MDP4 and MDP5) response will result in a high probability of drop out . This simulates common drop out in clinical trials due to lack of efficacy.

In going from MDP2 to MDP1, a modest gain in power is observed as a result of using the partial data from subjects that dropped out prior to T4. This gain underscores the importance of using the full longitudinal dataset when calculating power, as opposed to considering only the final timepoint.

Focusing on the completers only, clear bias can be observed for the MAR and MNAR scenarios (MDP3 to MDP5); this fact is important to note when powering a study based on simple summary statistics from completers, as is often done when using results from the literature. As expected, the MMRM is unbiased for all of the MCAR scenarios (MDP0 to MDP2), as well as for the MAR scenario (MDP3), since MMRMs assume that all missing data is MAR . Conversely, bias is present in the MMRM analysis for both of the MNAR scenarios (MDP4 and MDP5). Since this bias is to the detriment of the drug under study (i.e., leads to a diluted estimate of the treatment effect), these scenarios result in roughly 2 % lower power. Compared with the results from completers, the MMRM analysis had relatively smaller bias.

3 Tipping Point Analysis to Assess the Robustness of MMRM Analyses

As mentioned in Sect. 1, the MMRM, which is often used as the primary analysis model, assumes that all missing data are MAR—an assumption that cannot be verified. Both the clinical trial sponsor and government regulatory agencies are interested in assessing the robustness of any conclusions coming from an MMRM analyses against deviations from the MAR assumption. Given this interest, many methods for sensitivity analysis have been proposed and developed (see e.g., NRC, 2010 and references therein). Some of the more notable proposed methods include selection models, pattern-mixture models, and controlled-imputation models (see, e.g., Carpenter et al. 2013; Mallinckrodt et al. 2013; O’Kelly and Ratitch 2014). Another method, which has recently gained attention is the so-called tipping point approach. At a high level, a tipping point analysis varies the imputed values for the missing data (usually for the treatment arms only) by the exact amount needed to make a significant result turn nonsignificant.

Ratitch et al. (2013) have proposed three variations of tipping point analyses using pattern imputation with a delta adjustment. Our discussion considers the variation in which standard multiple imputation is performed first, and then a delta-adjustment (\(\updelta )\) is applied simultaneously to all imputed values in the treatment group. The goal is to find the smallest \(\updelta \) that will turn the significant p-value (as calculated from the primary MMRM model) to a nonsignificant value. In addition to being relatively straightforward to interpret, this approach has the attractive quality of returning a quantitative result that is directly comparable on the scale of interest, which can then be put into clinical context. The following steps provide details for a bisectional procedure to solve for this tipping point \(\updelta \).

General Algorithm to Solve for Tipping Point \({\varvec{\updelta }}\)

Note the definitions of the following algorithm variables:

  • m: the number of imputations to be used in the multiple imputation procedure (a value should be prespecified in the study protocol).

  • d: the difference between the maximum and minimum values for the variable/endpoint under investigation (i.e., the maximum allowable shift)

  • df: degrees of freedom

  • p\(_\mathrm{target}\): Target probability (e.g., type-I error)

  • t\(_\mathrm{target}\): t-value corresponding to p\(_\mathrm{target}\). If lower values of Y represent higher levels of efficacy, then this value must be negated in the search algorithm because the target t-value needs to be negative. (Note that the corresponding degrees of freedom (df) are actually a function of the data, as defined in Step 5 below).

  • \(\epsilon \): A tolerance level, on the t-scale, under which convergence can be declared (e.g. 0.001).

  • p\(_\mathrm{prim}\): p-value from the primary model

Given a dataset with intermittent missing data, the basic algorithm to conduct the tipping-point analysis is outlined below; instructions for procedures conducted in SAS refer to SAS version 9.3 or later.

  1. 0.

    Initialize \(\updelta _\mathrm{L} = -\)d, \(\updelta _\mathrm{C} = 0\), \(\updelta _\mathrm{U} =\) d, DONE \(=0\), COUNTER \(=0\).

  2. 1.

    Using a Markov Chain Monte-Carlo method (see e.g., Schafer 1997), make the observed dataset monotone-missing. This step can be accomplished for each treatment group using proc mi within SAS by using the option mcmc chain=multiple impute=monotone in conjunction with all covariates (excluding treatment) included in the primary analysis model. This step will generate m monotone-missing datasets. Note that the study protocol should specify the random seed used in this step.

  3. 2.

    Applying parametric regression to the m monotone-missing datasets, impute the missing values in a stepwise fashion starting with the first postdose timepoint. This step can be accomplished for each treatment group using proc mi in SAS using the monotone reg option in conjunction with all covariates (excluding treatment) included in the primary analysis model. This step will generate m complete datasets (one imputed dataset for each of the m monotone-missing datasets).

Do Steps 3–9 while Not DONE

  1. 3.

    Subtract \(\updelta _\mathrm{C}\) from each of the imputed values of the test drug treatment arms (to the detriment of test drug).

  2. 4.

    Analyze each of the m post-imputation complete datasets using the primary model, obtaining point estimates for the parameter of interest (e.g., mean change-from-baseline treatment difference at the last timepoint) and the associated variance.

  3. 5.

    Using the proc mianalyze procedure in SAS, combine the m means and variances from the m analyses to obtain the final test statistic and p-value, \(t_{\updelta _c } \) and \(p_{\updelta _c }\), respectively (Rubin 1987). The final test statistic \( \hat{Q} \) / (V\(^{(1/2)})\) is approximately distributed as t\(_{\nu }\), where \(\hat{Q}\). is the sample mean of the m point estimates, V \(=\hat{U} + \) (m \(+\) 1) (B/m), \(\hat{U}\) is the sample mean of the m variance estimates, and B is the sample variance of the m point estimates. The degrees of freedom, \(\nu \) , are computed as follows (Barnard and Rubin 1999): \(\nu =\) [(\(\nu _{1})^{-1}\) + (\(\nu _{2})^{-1}\)]\(^{-1}\), where \(\nu _{1} =\) (m \(-1\)) [1 + (\(\hat{U}\)/(1 \(+\) m \(^{-1})\) B)]\(^{2}\) and \(\nu _{2} = (1-\gamma ) \quad \nu _{0 }(\nu _{0}\) +1) / (\(\nu _{0} +3)\), with \(\gamma = (1+m^{-1})\) B/V and where \(\nu _{0 }\) represents the complete-data degrees of freedom.

  4. 6.

    Compute DIFF\(= t_{\updelta _c }\)\(t_{\text {target}}\).

  5. 7.

    (a) if \({\vert }\mathrm{DIFF}{\vert } <\epsilon \) then DONE \(=1\) (We are satisfied with \(\updelta _\mathrm{C})\)

    (b) else if (DIFF > 0) then \(\updelta _\mathrm{L}=\updelta _\mathrm{C}\); \(\updelta _\mathrm{C} =(\updelta _\mathrm{C}+ \,\delta _\mathrm{U})\)/2; (subtract larger \(\delta )\)

    (c) else if (DIFF < 0) then \(\delta _\mathrm{U}=\delta _\mathrm{C}\); \(\delta _\mathrm{C} =(\delta _\mathrm{C}+\delta _\mathrm{L})\)/2; (subtract smaller \(\delta )\)

  6. 8.

    If Not DONE, then COUNTER \(=\)COUNTER\( + 1\);

  7. 9.

    If COUNTER \(=50\), then DONE \(=1\); (guard against endless loop due to nonconvergence)

The final \(\updelta \) can be interpreted as the detrimental offset needed to apply to each imputed observation to change a significant result to a nonsignificant result. Confidence in the primary results stem from a large value of \(\updelta \), relative to (a) the assumed treatment difference, (b) the observed treatment difference per the primary model, and/or (c) a widely accepted clinically meaningful difference. For example, in a trial of an anti-depressant drug in which a clinically meaningful difference might be around 2–3 points, a trial result could be considered as robust if we were to subtract \(\delta \ge \) 3 points from every imputed value in the treatment arm and still maintain a statistically significant result.

Conventionally, MMRM analysis is based on restricted maximum likelihood while the tipping point methodology is implemented using multiple imputation (MI) analysis. Ideally, applying \(\updelta =0\) to the MI analysis would yield the p-value from the MMRM analysis (p\(_\mathrm{prim})\). More important, setting p\(_\mathrm{t arget}=\mathrm{p}_\mathrm{prim}\) would ideally yield a solution of \(\updelta =0\). If the \(\updelta \) obtained does not equal 0, then the value of \(\updelta \) obtained when setting p\(_\mathrm{t arget}=\alpha \) will be biased, per the intended interpretation. Unfortunately, simulation results indicate that the above method will not always yield a \(\updelta \) value equal to 0 when setting p\(_\mathrm{t arget}=\) p\(_\mathrm{prim}\). This inconsistency might be due to additional variation in the analysis of multiple imputation as compared to the restricted maximum likelihood analysis. To overcome this discrepancy, we advise running the above algorithm twice and in the following order: first, with the setting p\(_\mathrm{t arget}=\) p\(_\mathrm{prim}\), and then with the setting p\(_\mathrm{t arget}=\alpha \), yielding \(\delta _\mathrm{prim}\) and \(\delta _{\alpha .}\) The final \(\updelta \) is then computed as \(\delta =\delta _{\alpha }-\delta _\mathrm{prim}\). The value \(\delta _\mathrm{prim}\) can be thought of as a calibration factor in going from the MMRM to the MI model, accounting for the methodological differences between the two, as well as for the inherent randomness in the MI process. Simulation results indicate that t-values and p-values arising from the MMRM and MI models (\(\updelta =0\)) are highly similar, providing reassurance that the \(\updelta \) (i.e., \(\delta _{\alpha }-\delta _\mathrm{prim})\) obtained using the MI model translates well to the MMRM model.

As might be expected, higher values of m will yield results with greater stability. This stability applies not only to the estimates produced by the MI approach, but also to the adjusted degrees of freedom (df). The adjustment for the df was first proposed by Barnard and Rubin (1999), and has subsequently gained widespread use (e.g., adopted in SAS). The major impetus for the adjustment, as compared to the initial proposal for a df adjustment as cited in Rubin (1987), was to guard against the possibility that the df used for the MI approach would exceed the df present in the original MMRM for the complete data. However, this df adjustment might be very conservative in certain situations, particularly for smaller sample sizes when low numbers of imputations are used. This characteristic of the df adjustment might have the effect of producing abnormally large \(\updelta \) values since the respective t-values from the original MMRM and from the MI approach will be based on different t-distributions. A simple fix is to ensure that a sufficient number of imputations are used in the MI.

The following simulation study investigates (a) the variation of the df for the MI approach at \(\updelta =0\) for different values of m, (b) the differences between the MMRM and the MI approach (at \(\updelta =0\)) for the t-values and p-values, (c) the variation of \(\delta _\mathrm{prim}\) for different values of m, and (d) the variation of \(\delta =\delta _{\alpha }-\delta _\mathrm{prim}\) for various treatment effect sizes and CDRs.

Unless otherwise noted, the assumptions used in the simulation study for Sect. 2 were also used for all simulations in Sect. 3. For ease of interpretability, a simple MAR mechanism (MDP3 from Sect. 2) was assumed for the missing data .

Assessing the variation of the df at \({{\varvec{\updelta =0}}}\)

Table 2 Summary of Variation for the df under MI approach (at \(\updelta =0\)) by m (based on 1,000 simulations)

Moderate-to-large differences in df between the original MMRM and the MI model could cause convergence issues or unreliable results when attempting to solve for the tipping point \(\updelta \). Table 2 shows the summary of dfs from the MMRM and MI model using 1,000 simulations. For the case of MDP3, the trial had about 80 % power with about 30 % missing data at Time 4. The tipping point analysis was performed for about 800 simulated cases for which the MMRM results were significant. The dfs from the MMRM analyses varied between 150 and 204. However, the dfs for MI varied from 7 to 232 for \(m=5\), and from 47 to 199 for \(m=20\). Since the adjusted df are a direct function of the data itself, it is challenging to provide absolute general guidance as to how many imputations, m, are enough. For the scenario considered in this section, it appears as though \(m=100\) is sufficient to have relatively small variation for the df under consideration.

Assessing the variation of \({\varvec{\updelta }}_\mathbf{{prim}}\)

Table 3 Summary of differences between the MMRM and the MI approach (at \(\updelta =0\)) for the t-values and p-values by m (based on 1,000 simulations)

As discussed above, an adjustment (\(\delta _\mathrm{{prim}})\) is needed to account for the differences between the primary MMRM and the MI analysis used in the tipping point procedure. The following simulation study examines the variation of \(\delta _\mathrm{{prim}}\) as well as the differences of the t-values and p-values between the primary MMRM and the MI approach with \(\updelta =0\) for various values of m. The tolerance level for convergence of the t-values was set at \(\epsilon = 0.005\).

The simulation results in Table 3 indicate that the differences of both the t-values and the p-values between the MMRM and the MI model at \(\updelta =0\) are typically small, particularly for \(m\ge \) 100. This finding provides general confidence that the MI model adequately approximates the MMRM.

Table 4 Summary of \(\delta _\mathrm{{prim}}\) by m (based on 100 simulations)

Due to the extensive computation required to estimate \(\delta _\mathrm{prim}\), only 100 simulations were conducted to investigate the variation of \(\delta _\mathrm{prim}\) as a function of m. Table 4 indicates that the variation of \(\delta _\mathrm{prim}\) generally decreases as m increases, with the pencentiles generally shrinking to values closer to 0 as m increases. However, this trend cannot be expected to continue as \(m\rightarrow \infty \), because some differences due to methodology will persist. For the examined scenario, no clear improvement was seen moving from \(m=100\) to \(m=250\).

Assessing the Variation of the Final Tipping Point \({\varvec{\delta }}={\varvec{\delta }}_{\varvec{\alpha }}-{\varvec{\delta }}_\mathrm{prim}\)

The effect of various treatment differences and CDRs on the distribution of \(\updelta \) was examined using the same simulation assumptions as before, but fixing \(m=100\). Note that \(\upmu _\mathrm{{Pbo}} =\) (0.00, 0.46, 0.92, 1.37, 1.83) is held constant, while \(\upmu _\mathrm{{Act}}\) is set equal to (\(1- \uptheta ) \upmu _\mathrm{{Pbo}}\), \(\uptheta = 0.35\) and 0.50, with larger values of \(\uptheta \) resulting in larger efficacy (since higher values of \(\upmu \) represent lower efficacy). Cumulative drop-out rates of (0.054, 0.106, 0.154, 0.200) and (0.086, 0.165, 0.236, 0.300) were considered.

As shown in Table 4, the offset \(\delta _\mathrm{{prim}}\) needed to align the results between the MMRM and the MI model is non-ignorable across the scenarios. Focusing on the scenario with a 30 % CDR at T4 and \(\theta =0.50\), we note that the mean value of \(\delta _\mathrm{{prim}}\) needed to calibrate the two models was estimated as 0.14, with observed values ranging from \(-0.16\) to 0.56. As a frame of reference, the true treatment difference at T4 is (\(1- \theta ) \upmu _\mathrm{{Pbo}}-\upmu _\mathrm{{Pbo}} = -0.5 (1.83) = -0.92\).

Staying with the same scenario, the mean value of \(\updelta \) is equal to \(-1.34\). That is, on average, all of the missing values in the treatment arm would need to be detrimentally adjusted 1.34 points in order to make the significant p-value obtained become non-significant (i.e., equal to 0.05). Assuming these results were obtained for a single study, and in the context of an observed (or assumed) treatment difference of \(-0.92\), such a \(\updelta \) can be considered as evidence of a fairly robust treatment effect.

Conclusions across scenarios are best drawn by focusing on the estimated mean and quartiles (as opposed to the more variable quantities of the simulated minimum and maximum values). As expected, Table 5 demonstrates that larger detrimental values have to be applied to the imputed data from the treatment arms as (a) the drop-out rate goes down and (b) the true treatment effect goes up.

Table 5 Summary of Variation of \(\updelta _\mathrm{{prim}}\) and Final Tipping Point \(\updelta \) (based on 100 simulations)

Without going into great detail, one technical point bears mentioning. When applying \(\updelta \) to the imputed values, it seems reasonable to not allow adjusted values past the minimum or maximum allowable value of the endpoint. However, this restriction might need to be relaxed when applying the convergence algorithm.

4 Monte-Carlo Approaches for Control-Based Imputation Analysis

Control-based imputation (CBI) Control-based imputation (CBI) has recently been proposed as an approach for sensitivity analysis (Carpenter et al. 2013), in which different imputation methods are used for the treatment and control groups. The missing data in the control group are imputed under the assumption of MAR , while the missing data in the treatment group are imputed using the imputation model built from the control group. One of the primary assumptions in this CBI approach is that the true post-discontinuation efficacy response in the test drug group is similar to the efficacy response of those subjects continuing in the trial in the control group. This control-based imputation model might be reasonable when no rescue or other active medications are taken by patients who drop out (Mallinckrodt et al. 2013). In general, this CBI can provide a conservative estimate of the treatment effect in superiority trials. Recently, these methods have become more attractive because the assumptions are transparent and understandable for clinical trial scientists. The methods address an attributable treatment effect (estimand) under the intent-to-treat principle but exclude the potential confounding effect of rescue medications (Mallinckrodt et al. 2013). Thus, the estimand captures the causal-effect outcomes for the test therapy.

The three most commonly used CBI methods (Carpenter et al. 2013) are defined by specifying the mean profile after drop out in the treatment group using the profile in the control group as follows:

  1. I.

    Copy Increments in Reference (CIR): The increment mean change from the time of drop out for a patient in the treatment group will be the same as the increment mean change for a patient in the control group. Namely, the mean profile after drop out for the treatment group will be parallel to the mean profile of the control group.

  2. II.

    Jump to Reference (J2R): The mean profile after drop out for the test drug group will equal the mean profile of the control group. That is, the mean profile for the test drug group has a ‘jump’ from the mean of test drug before drop out to the mean of control after drop out.

  3. III.

    Copy Reference (CR) : the mean profile for a drop-out patient in test drug group will equal the mean profile for the control group for all time points, including the time points before drop out .

These CBI approaches can be implemented using multiple imputation. Several SAS macros to implement this methodology have been developed by the Drug Information Association (DIA) Missing Data Working Group (macros are available at www.missingdata.org.uk).

Consider a response vector for patient i, \({\varvec{Y}}_i =\{Y_{ij}, j=1,\ldots ,t\}\), and assume

$$ {\varvec{Y}}_i |{\varvec{X}}_i \sim N\left( {{\varvec{\upmu }}_i ,\varSigma } \right) . $$

Let \(\upmu _{ij}\) represent the mean for patient i at timej, with the MMRM specified as

$$\begin{aligned} \upmu _\mathrm{ij} =\alpha _j +\beta _j D_i +{\varvec{\gamma }}_j^{\prime } {{\varvec{X}}}_i., \end{aligned}$$
(2)

where \(\beta _j \) is the mean treatment difference from control at time j after adjusting for the covariates \(X_i \), \(D_i \) is an indicator for treatment (1 for treatment and 0 for control), and \({\varvec{\gamma }}_i \) is a vector of coefficients for the covariates. The following steps can be used to implement the CBI analysis,

  1. 1.

    Fit the MMRM thus yielding the estimates \(\hat{\alpha }_j ,\hat{\beta }_j ,\hat{\varvec{\gamma }}_j \) and \(\hat{\Sigma }\);

  2. 2.

    Assume non-informative priors for the parameters, and draw a sample for these parameters from their posterior distribution, denoted by \(\alpha _j ,\beta _j ,{\varvec{\gamma }}_j \) and \({\varSigma }\). Note that the DIA Missing Data Working Group macros used SAS PROC MCMC to fit the MMRM model and draw these parameters.

  3. 3.

    For a patient who dropped out at time j, draw a sample from the conditional distribution to impute the missing vector, i.e.,

  4. 4.
    $$\begin{aligned} \mathbf{y}_{\text {mis}} |\mathbf{y}_{\text {obs}} ,\text {X},{\varvec{\upmu }}, {\Sigma }\sim \text {N}({\varvec{\upmu }}_\text {m} + {\Sigma }_{\text {mo}} {\Sigma }_{\text {oo}}^{-1} \left( {\mathbf{y}_{\text {obs}} -{\varvec{\upmu }}_\mathrm{o} } \right) , {\Sigma }_{\text {mm}} -{\Sigma }_{\text {mo}} {\Sigma }_{\text {oo}}^{-1} {\Sigma }_{\text {om}} ) \end{aligned}$$
    (3)

    where

    $$ {\varvec{\upmu }} = \left( \begin{array}{l} {\varvec{\upmu }}_{o}\\ {\varvec{\upmu }}_\mathrm{m} \end{array} \right) , \quad \Sigma = \left( \begin{array}{ll}\Sigma _\mathrm{oo} &{}\Sigma _\mathrm{om}\\ \Sigma _\mathrm{mo} &{}\Sigma _\mathrm{mm} \end{array} \right) $$

    are split into sub-vectors and block matrices with dimensions corresponding to the observed (indicated with ‘o’) and missing data (indicated with m) portions of the response vector. The patient and time indicators i and j are omitted in the formulas for simplicity. To implement the CBI , if a patient is in placebo group, the \({\varvec{\mu }}_o\) and \({\varvec{\mu }}_m\) for the placebo group will be used. Otherwise, the means will be modified as specified per the chosen CBI approach. Specifically, a patient in the treatment group who dropped out after time j will have

    1. a.
      $$\begin{aligned} {\varvec{\upmu }}_\mathrm{m}^\mathrm{d} =\left\{ {{\begin{array}{cl} {{\varvec{\upmu }}_\mathrm{m}^\mathrm{p} +{\upmu }_\mathrm{j}^\mathrm{d} -{\upmu }_\mathrm{j}^\mathrm{p} }&{} \mathrm{{for\,CIR}} \\ {{\varvec{\upmu }}_\mathrm{m}^\mathrm{p} }&{} \mathrm{{for\,JR}} \\ {{\varvec{\upmu }}_\mathrm{m}^\mathrm{p} +{\Sigma }_\mathrm{{mo}} {\Sigma }_\mathrm{{oo}}^{-1} ({\varvec{\upmu }}_\mathrm{o}^\mathrm{d} -{\varvec{\upmu }}_\mathrm{o}^\mathrm{p} )}&{} \mathrm{{for\,CR}} \\ \end{array} }} \right. \end{aligned}$$
      (4)
    2. b.

      where the superscripts d and p are used to indicate the mean vector for drug or placebo.

  5. 5.

    Repeat Steps 2 and 3 for the number of imputed datasets;

  6. 6.

    Analyze each imputed dataset using the primary model (e.g., ANCOVA model at last visit) to get estimated treatment difference and its standard error;

  7. 7.

    Combine the results using Rubin’s rule for final statistical inference (Rubin 1987).

Simulation studies show the estimated variances for the treatment differences using the regular MI techniques are always larger than the corresponding empirical variances. This phenomenon has been noticed for the copy-reference imputation method by Lu (2014) and Ayele et al. (2014). Lu (2014) proposed an analytical approach to get the correct variance estimate using the delta method. However, that approach is available only for copy-reference imputation and requires special programming for each specific analysis model.

Liu and Pang (2015) proposed methods to get more appropriate variances for the CBI estimates. One of their approaches is a Bayesian MCMC-based method that accounts for the pattern of missing data and obtains the estimates for the treatment difference and its variance from empirical MCMC samples. Based on the mean profile specified in equation (3), the overall treatment difference at the last time point under CBI can be written as a weighted average over the missing data patterns, that is

$$ \theta ^{CBI}={\sum }_{j=1}^t \pi _j \mu _\mathrm{t j}^d -\mu _t^p $$

where \(\mu _\mathrm{t j}^d \) is the mean at last time point t under missing data pattern j as given in (3) and the {\(\pi _j, j=1,...,t\)} are the proportions of patients in the missing data patterns for the drug group. As \({\sum }_{j=1}^t \pi _j =1\), we have

$$\begin{aligned} \theta ^\mathrm{CBI}=\left\{ {{\begin{array}{ll} { {\sum }_{j=1}^t \pi _j (\mu _j^d -\mu _j^p )}&{} {for\,CIR} \\ \quad {\pi _t (\mu _t^d -\mu _t^p )}&{} {for\,J2R} \\ { {\sum }_{j=1}^t \pi _j (\mu _\mathrm{t j}^d -\mu _t^p )}&{} {for\,CR} \\ \end{array} }} \right. \end{aligned}$$
(5)

where \(\mu _j^d \) and \(\mu _j^p \)are the means at time j for drug and placebo, respectively. Therefore, the treatment effect under CBI can be expressed as a linear combination of the parameters of MMRM and the proportions of patients in each pattern of missing data. We note this approach is a special pattern-mixture model (PMM) where the missing data are handled differently by the pattern of missing data only for the treatment group. The missing data in the placebo group are all sampled assuming a MAR process.

To account for the uncertainty of the proportions of missing data {\(\pi _j ,j=1,...,t\)}, random proportions are also drawn from a Dirichlet distribution in the Bayesian MCMC process, which corresponds to a posterior distribution for the proportions with a Jefferys prior. The empirical distribution and statistical inference for \(\theta ^{CBI}\) are obtained from the MCMC samples. Specifically, the following steps are implemented in the process:

  1. 1.

    Specify flat priors for \(\alpha _j,\beta _j, {\varvec{\gamma }}_j \) and \({\Sigma }\), for example, use N(0, \(\sigma , 2=10000)\) for \(\alpha _j,\beta _j,\) and each of the element of \({\varvec{\gamma }}_j \), and an inverse Wishart distribution IW(\(t\,+\, \) 1, I), where I is an identity matrix of dimension t. We used SAS PROC MCMC to fit the MMRM model, specifically:

    1. a.

      Use conjugate sampling to draw a sample for \({\Sigma }\);

    2. b.

      Use random walk Metropolis to draw samples for \(\alpha _j,\beta _j,{\varvec{\gamma }} _j \);

    3. c.

      For a patient who dropped out at time j, PROC MCMC will draw a sample for the missing vector \(\mathbf{y}_\mathrm{{mis}} \) from conditional distribution as specified in (2) with the parameters from above;

    4. d.

      Draw \(\{\pi _j ,j=1,\ldots ,t\}\) from the Dirichlet \((m_1 +0.5,\ldots ,m_t +0.5)\), where \(m_j\) is the number of drop-out patients at time j+1 in treatment group, and \(m_t\) is the number of completers;

    5. e.

      Evaluate \(\theta ^{CBI}\) with the formula (4).

  2. 2.

    The above process will be run with a burn-in, then repeat Steps a through e to obtain an empirical posterior distribution for \(\theta ^{CBI}\).

Note that this Bayesian MCMC process is a simulation based approach. It is important to check the convergence of the MCMC samples. Usually, the trace-plot can be examined visually, or some statistical measures can be checked such as Geweke or Raftery-Lewis that are provided by SAS PROC MCMC procedure.

We apply the regular MI analysis and Bayesian MCMC approach to an antidepressant drug trial dataset created by the DIA Missing Data Working Group (Mallinckrodt et al. 2013). The dataset was constructed from an actual clinical trial and made available by the Working Group (see www.missingdata.org.uk). The dataset contains 172 patients (84 in the treatment arm, 88 in the placebo control arm). Repeated measures for Hamilton Depression 17-item total scores were taken at baseline and Weeks 1, 2, 4, and 6, post-randomization. The Week 6 measurements were completed by about 76 % of the treatment group patients and about 74 % of the the control group patients. The analysis dataset included one patient record with intermittent missing data ; in all analysis methods, the missing data for this patient were imputed under the assumption of MAR . The monotone missing data were imputed under CBI methods of CIR, CR, or J2R .

In the analysis of this dataset, we noticed that the MCMC sampling had high autocorrelation. To increase the stability of the results, we used 200 imputations in the conventional MI analysis, and used 2,000 iterations for turning, 2,000 iterations for burn-in, and 200,000 in the main sampling, keeping one from every 10 samples (with option THIN=10 in PROC MCMC) to get a total of 20,000 samples for the posterior mean and standard deviation. Table 6 shows the analysis results. Compared with the mixed model analysis, the Bayesian MCMC under MAR produced very similar results. As compared to the results from the mixed model, the CBI analyses based on regular MI are conservative. With CBI , the point estimates are shrunk toward 0 but the standard errors (SEs) are very similar to the SEs from the MAR analysis. As such, the CBI analyses with regular MI have large \(p-\)values compared to the primary analysis under MAR. In fact, the result of the J2R analysis becomes insignificant. With the Bayesian MCMC approach, the CBI analysis results have similar point estimates as the CBI analysis with regular MI but have smaller SEs. As a result, the p-values from CIR , CR , and J2R all maintained significance.

Table 6 Primary and sensitivity analysis results for an anti-depressant trial

To check the convergence of the MCMC sampling, Fig. 1 shows the diagnostics plots for both the Bayesian MCMC samples for the primary analysis under MAR and the CR analysis. The trace-plots for both parameters show good mixing and stabilization. With the option of THIN=10, the autocorrelation decreases quickly. The posterior density curves are estimated well from the 20,000 samples.

Fig. 1
figure 1

Diagnostics plots for Bayesian MCMC under MAR and for Copy Reference Imputation Method

5 Discussions and Remarks

In many clinical trials, missing data might be unavoidable. We have illustrated some applications of Monte-Carlo simulation methods for handling of missing data issues for longitudinal clinical trials. Simulation-based approaches to dealing with missing data can be extremely useful in the conduct of clinical trials; most notably in the design stage to calculate needed sample size and power, as well as in the final analysis stage to conduct sensitivity analyses. We described a method to generate MVN longitudinal data under different assumed MDMs with specified CDRs. As a sensitivity analysis, we applied a \(\updelta \) -adjustment approach to account for the potential difference between the MMRM (typically used as the primary model in clinical trials) and the MI model used to facilitate the tipping point analysis , and to propose an adjustment to the final tipping point calculation. Depending on the number of imputations used, the inferential statistics produced by the MMRM and MI models can differ, due in part to differences in the approximated degrees of freedom . A sufficient number of imputations should be used to reduce this variation. The appropriate number of imputations to be used should be confirmed via simulation by the statistician during the analysis planning stage. We also presented a Bayesian MCMC method for CBI that provides a more appropriate variance estimate than regular multiple imputation.

The methods presented are only a few applications of simulation methods for missing data issues. Of course, many other simulation-based methods are available that can be used for missing data . For example, we considered only a logistic model for the MDM , noting that other models such as a probit model can also be used. In addition, the missing probabilities defined by the example MDMs depended only on the current time point and/or the next time point. Other MDMs may be defined allowing for the incorporation of additional time points. In the CBI approaches, we considered a Bayesian MCMC approach, although other avenues might also be used such as bootstrapping to obtain the appropriate variance for CBI methods. Although we considered only continuous endpoints, simulation-based methods can also be very useful in dealing with missing data for other types of endpoints such as binary, categorical, or time-to-event data.

One drawback for simulation-based methods is the random variation from the simulations. It is critical to assess the potential variation and/or monitor the convergence. When using simulation-based method for analysis of clinical trials, the analysis plan should pre-specify all the algorithms, software packages, and random seeds for the computation. Naturally, the analysis should use a sufficient number of imputations or replications in order to reduce the random variation. Of course, simulations examine only the statistical properties under the assumptions used in those simulations. Whenever possible, theoretical or analytic methods should be considered over simulations.