The randomized controlled clinical trial is the standard by which all trials are judged. In the simplest case, randomization is a process by which each participant has the same chance of being assigned to either intervention or control. An example would be the toss of a coin, in which heads indicates intervention group and tails indicates control group. Even in the more complex randomization strategies, the element of chance underlies the allocation process. Of course, neither trial participant nor investigator should know what the assignment will be before the participant’s decision to enter the study. Otherwise, the benefits of randomization can be lost. The role that randomization plays in clinical trials has been discussed in Chap. 5 as well as by numerous authors [112]. While not all accept that randomization is essential [10, 11], most agree it is the best method for achieving comparability between study groups, and the most appropriate basis for statistical inference [1, 3].

Fundamental Point

Randomization tends to produce study groups comparable with respect to known as well as unknown risk factors, removes investigator bias in the allocation of participants, and guarantees that statistical tests will have valid false positive error rates.

Several methods for randomly allocating participants are used [6, 9, 1214]. This chapter will present the most common of these methods and consider the advantages and disadvantages of each. Unless stated otherwise, it can be assumed that the randomization strategy will allocate participants into two groups, an intervention group and a control group. However, many of the methods described here can easily be generalized for use with more than two groups.

Two forms of experimental bias are of concern. The first, selection bias, occurs if the allocation process is predictable [5, 1518]. In this case, the decision to enter a participant into a trial may be influenced by the anticipated treatment assignment. If any bias exists as to what treatment particular types of participants should receive, then a selection bias might occur. All of the randomization procedures described avoid selection bias by not being predictable. A second bias, accidental bias, can arise if the randomization procedure does not achieve balance on risk factors or prognostic covariates. Some of the allocation procedures described are more vulnerable to accidental bias, especially for small studies. For large studies, however, the chance of accidental bias is negligible [5].

Whatever randomization process is used, the report of the trial should contain a brief, but clear description of that method. In the 1980s, Altman and Doré [15] reported a survey of four medical journals where 30% of published randomized trials gave no evidence that randomization had in fact been used. As many as 10% of these “randomized” trials in fact used non-random allocation procedures. Sixty percent did not report the type of randomization that was used. In one review in the 1990s, only 20–30% of trials provided fair or adequate descriptions, depending on the size of the trial or whether the trial was single center or multicenter [18]. More recently, a review of 253 trials published in five major medical journals after the release of the Consolidated Standards for Reporting Trials (CONSORT) [19] recommendations found little improvement in reports of how randomization was accomplished [20]. Descriptions need not be lengthy to inform the reader, publications should clearly indicate the type of randomization method and how the randomization was implemented.

Fixed Allocation Randomization

Fixed allocation procedures assign the interventions to participants with a prespecified probability, usually equal, and that allocation probability is not altered as the study progresses. A number of methods exist by which fixed allocation is achieved [6, 9, 12, 14, 2125], and we will review three of these—simple, blocked, and stratified.

Our view is that allocation to intervention and control groups should be equal unless there are compelling reasons to do otherwise. Peto [7] among others, has suggested an unequal allocation ratio, such as 2:1, of intervention to control. The rationale for such an allocation is that the study may slightly lose sensitivity but may gain more information about participant responses to the new intervention, such as toxicity and side effects. In some instances, less information may be needed about the control group and, therefore, fewer control participants are required. If the intervention turns out to be beneficial, more study participants would benefit than under an equal allocation scheme. However, new interventions may also turn out to be harmful, in which case more participants would receive them under the unequal allocation strategy. Although the loss of sensitivity or power may be less than 5% for allocation ratios approximately between 1/2 and 2/3 [8, 21], equal allocation is the most powerful design and therefore generally recommended. We also believe that equal allocation is more consistent with the view of indifference or equipoise toward which of the two groups a participant is assigned (see Chap. 2). Unequal allocation may indicate to the participants and to their personal physicians that one intervention is preferred over the other. In a few circumstances, the cost of one treatment may be extreme so that an unequal allocation of 2:1 or 3:1 may help to contain costs while not causing a serious loss of power. Thus, there are tradeoffs that must be considered. In general, equal allocation will be presumed throughout the following discussion unless otherwise indicated.

Simple Randomization

The most elementary form of randomization, referred to as simple or complete randomization, is best illustrated by a few examples [9, 12]. One simple method is to toss an unbiased coin each time a participant is eligible to be randomized. For example, if the coin turns up heads, the participant is assigned to group A; if tails, to group B. Using this procedure, approximately one half of the participants will be in group A and one half in group B. In practice, for small studies, instead of tossing a coin to generate a randomization schedule, a random digit table on which the equally likely digits 0 to 9 are arranged by rows and columns is usually used to accomplish simple randomization. By randomly selecting a certain row (column) and observing the sequence of digits in that row (column) A could be assigned, for example, to those participants for whom the next digit was even and B to those for whom the next digit was odd. This process produces a sequence of assignments which is random in order, and each participant has an equal chance of being assigned to A or B.

For large studies, a more convenient method for producing a randomization schedule is to use a random number producing algorithm, available on most computer systems. A simple randomization procedure might assign participants to group A with probability p and participants to group B with probability 1 − p. One computerized process for simple randomization is to use a uniform random number algorithm to produce random numbers in the interval from 0.0 to 1.0. Using a uniform random number generator, a random number can be produced for each participant. If the random number is between 0 and p, the participant would be assigned to group A; otherwise to group B. For equal allocation, the probability cut point, p, is one-half (i.e., p = 0.50). If equal allocation between A and B is not desired (p ≠ 1/2), then p can be set to the desired proportion in the algorithm and the study will have, on the average, a proportion p of the participants in group A.

This procedure can be adapted easily to more than two groups. Suppose, for example, the trial has three groups, A, B and C, and participants are to be randomized such that a participant has a 1/4 chance of being in group A, a 1/4 chance of being in group B, and a 1/2 chance of being in group C. By dividing the interval 0 to 1 into three pieces of length 1/4, 1/4, and 1/2, random numbers generated will have probabilities of 1/4, 1/4 and 1/2, respectively, of falling into each subinterval. Specifically, the intervals would be <0.25, 0.25–0.50, and ≥0.50. Then any participant whose random number is less than 0.25 is assigned A, any participant whose random number falls between 0.25 and 0.50 is assigned B and the others, C. For equal allocation, the interval would be divided into thirds and assignments made accordingly.

The advantage of this simple randomization procedure is that it is easy to implement. The major disadvantage is that, although in the long run the number of participants in each group will be in the proportion anticipated, at any point in the randomization, including the end, there could be a substantial imbalance [23]. This is true particularly if the sample size is small. For example, if 20 participants are randomized with equal probability to two treatment groups, the chance of a 12:8 split (i.e., 60% A, 40% B) or worse is approximately 50%. For 100 participants, the chance of the same ratio (60:40 split) or worse is only 5%. While such imbalances do not cause the statistical tests to be invalid, they do reduce ability to detect true differences between the two groups. In addition, such imbalances appear awkward and may lead to some loss of credibility for the trial, especially for the person not oriented to statistics. For this reason primarily, simple randomization is not often used, even for large studies. In addition, interim analysis of accumulating data might be difficult to interpret with major imbalances in number of participants per arm, especially for smaller trials.

Some investigators incorrectly believe that an alternating assignment of participants to the intervention and the control groups (e.g., ABABAB …) is a form of randomization. However, no random component exists in this type of allocation except perhaps for the first participant. A major criticism of this method is that, in a single-blind or unblinded study, the investigators know the next assignment, which could lead to a bias in the selection of participants. Even in a double-blind study, if the blind is broken on one participant as sometimes happens, the entire sequence of assignments is known. Therefore, this type of allocation method should be avoided.

Blocked Randomization

Blocked randomization, sometimes called permuted block randomization, was described by Hill [4] in 1951. It avoids serious imbalance in the number of participants assigned to each group, an imbalance which could occur in the simple randomization procedure. More importantly, blocked randomization guarantees that at no time during randomization will the imbalance be large and that at certain points the number of participants in each group will be equal [9, 12, 26]. This protects against temporal trends during enrollment, which is often a concern for larger trials with long enrollment phases.

If participants are randomly assigned with equal probability to groups A or B, then for each block of even size (for example, 4, 6 or 8) one half of the participants will be assigned to A and the other half to B. The order in which the interventions are assigned in each block is randomized, and this process is repeated for consecutive blocks of participants until all participants are randomized. For example, the investigators may want to ensure that after every fourth randomized participant, the number of participants in each intervention group is equal. Then a block of size 4 would be used and the process would randomize the order in which two A’s and two B’s are assigned for every consecutive group of four participants entering the trial. One may write down all the ways of arranging the groups and then randomize the order in which these combinations are selected. In the case of block size 4, there are six possible combinations of group assignments: AABB, ABAB, BAAB, BABA, BBAA, and ABBA. One of these arrangements is selected at random and the four participants are assigned accordingly. This process is repeated as many times as needed.

Another method of blocked randomization may also be used. In this method for randomizing the order of assignments within a block of size b, a random number between 0 and 1 for each of the b assignments (half of which are A and the other half B) is obtained. The example below illustrates the procedure for a block of size four (2As and 2Bs). Four random numbers are drawn between 0 and 1 in the order shown.

Assignment

Random number

Rank

A

0.069

1

A

0.734

3

B

0.867

4

B

0.312

2

The assignments then are ranked according to the size of the random numbers. This leads to the assignment order of ABAB. This process is repeated for another set of four participants until all have been randomized.

The advantage of blocking is that balance between the number of participants in each group is guaranteed during the course of randomization. The number in each group will never differ by more than b/2 when b is the length of the block. This can be important for at least two reasons. First, if the type of participant recruited for the study changes during the entry period, blocking will produce more comparable groups. For example, an investigator may use different sources of potential participants sequentially. Participants from these sources may vary in severity of illness or other crucial respects. One source, with the more seriously ill participants, may be used early during enrollment and another source, with healthier participants, late in enrollment [3]. If the randomization were not blocked, more of the seriously ill participants might be randomized to one group. Because the later participants are not as sick, this early imbalance would not be corrected. A second advantage of blocking is that if the trial should be terminated before enrollment is completed, balance will exist in terms of number of participants randomized to each group.

A potential, but solvable problem with basic blocked randomization is that if the blocking factor b is known by the study staff and the study is not double-blind, the assignment for the last person entered in each block is known before entry of that person. For example, if the blocking factor is 4 and the first three assignments are ABB, then the next assignment must be A. This could, of course, permit a bias in the selection of every fourth participant to be entered. Clearly, there is no reason to make the blocking factor known. However, in a study that is not double-blind, with a little ingenuity the staff can soon discover the blocking factor. For this reason, repeated blocks of size 2 should not be used. On a few occasions, perhaps as an intellectual challenge, investigators or their clinic staff have attempted to break the randomization scheme [27]. This curiosity is natural but nevertheless can lead to selection bias. To avoid this problem in the trial that is not double-blind, the blocking factor can be varied as the recruitment continues. In fact, after each block has been completed, the size of the next block could be determined in a random fashion from a few possibilities such as 2, 4, 6, and 8. The probabilities of selecting a block size can be set at whatever values one wishes with the constraint that their sum equals 1.0. For example, the probabilities of selecting block sizes 2, 4, 6, and 8 can be 1/6, 1/6, 1/3, and 1/3 respectively. Randomly selecting the block size makes it very difficult to determine where blocks start and stop and thus determine the next assignment.

A disadvantage of blocked randomization is that, from a strictly theoretical point of view, analysis of the data is more complicated than if simple randomization were used. Unless the data analysis performed at the end of the study reflects the randomization process actually performed [26, 2830] it may be incorrect since standard analytical methods assume a simple randomization. In their analysis of the data most investigators ignore the fact that the randomization was blocked. Matts and Lachin [26] studied this problem and concluded that the measurement of variability used in the statistical analysis is not exactly correct if the blocking is ignored. Usually the analysis ignoring blocks is conservative, though it can be anticonservative especially when the blocks are small (e.g. a block size of two). That is, the analysis ignoring blocks will have probably slightly less power than the correct analysis, and understate the “true” significance level. Since blocking guarantees balance between the two groups and, therefore, increases the power of a study, blocked randomization with the appropriate analysis is more powerful than not blocking at all or blocking and then ignoring it in the analysis [26]. Also, the correct treatment of blocking would be difficult to extend to more complex analyses. Being able to use a single, straightforward analytic approach that handles covariates, subgroups, and other secondary analyses simplifies interpretation of the trial as a whole. Performing the most correct analysis is even more problematic for adaptive designs, as discussed in the next section.

Stratified Randomization

One of the objectives in allocating participants is to achieve between group comparability of certain characteristics known as prognostic or risk factors [12, 3144]. These are baseline factors which correlate with subsequent participant response or outcome. Investigators may become concerned when prognostic factors are not evenly distributed between intervention and control groups. As indicated previously, randomization tends to produce groups which are, on the average, similar in their entry characteristics, known or unknown, or unmeasured. This is a concept likely to be true for large studies or for many small studies when averaged. For any single study, especially a small study, there is no guarantee that all baseline characteristics will be similar in the two groups. In the multicenter Aspirin Myocardial Infarction Study [45] which had 4,524 participants, the top 20 cardiovascular prognostic factors for total mortality identified in the Coronary Drug Project [43] were compared in the intervention and control groups and no major differences were found (Furberg CD, unpublished data). However, individual clinics, with an average of 150 participants, showed considerable imbalance for many variables between the groups. Imbalances in prognostic factors can be dealt with either after the fact by using stratification in the analysis (Chap. 18) or can be prevented by using stratification in the randomization. Stratified randomization is a method which helps achieve comparability between the study groups for those factors considered.

Stratified randomization requires that the prognostic factors be measured either before or at the time of randomization. If a single factor is used, it is divided into two or more subgroups or strata (e.g., age 30–34 years, 35–39 years, 40–44 years). If several factors are used, a stratum is formed by selecting one subgroup from each of them. The total number of strata is the product of the number of subgroups in each factor. The stratified randomization process involves measuring the level of the selected factors for a participant, determining to which stratum she belongs and performing the randomization within that stratum.

Within each stratum, the randomization process itself could be simple randomization, but in practice most clinical trials use some blocked randomization strategy. Under a simple randomization process, imbalances in the number in each group within the stratum could easily happen and thus defeat the purpose of the stratification. Blocked randomization is, as described previously, a special kind of stratification. However, this text will restrict use of the term blocked randomization to stratifying over time, and use stratified randomization to refer to stratifying on factors other than time. Some confusion may arise here because early texts on design used the term blocking as this book uses the term stratifying. However, the definition herein is consistent with current usage in clinical trials.

As an example of stratified randomization with a block size of 4, suppose an investigator wants to stratify on age, sex and smoking history. One possible classification of the factors would be three 10-year age levels and three smoking levels.

Age (years)

Sex

Smoking history

1. 40–49

Male

Current smoker

2. 50–59

Female

Ex-smoker

3. 60–69

 

Never smoked

Thus, the design has 3 × 2 × 3 = 18 strata. The randomization for this example appears in Table 6.1.

Table 6.1 Stratified randomization with block size of 4

Participants who were between 40 and 49 years old, male and current smokers, that is, in stratum 1, would be assigned to groups A or B in the sequences ABBA BABA .... Similarly, random sequences would appear in the other strata.

Small studies are the ones most likely to require stratified randomization, because in large studies, the magnitude of the numbers increases the chance of comparability of the groups. In the example shown above, with three levels of the first factor (age), two levels of the second factor (sex), and three levels of the third factor (smoking history), 18 strata have been created. As factors are added and the levels within factors are refined, the number of strata increase rapidly. If the example with 18 strata had 100 participants to be randomized, then only five to six participants would be expected per stratum if the study population were evenly distributed among the levels. Since the population is most likely not evenly distributed over the strata, some strata would actually get fewer than five to six participants. If the number of strata were increased, the number of participants in each stratum would be even fewer. Pocock and Simon [41] showed that increased stratification in small studies can be self-defeating because of the sparseness of data within each stratum. Thus, only important variables should be chosen and the number of strata kept to a minimum.

In addition to making the two study groups appear comparable with regard to specified factors, the power of the study can be increased by taking the stratification into account in the analysis. Stratified randomization, in a sense, breaks the trial down into smaller trials. Participants in each of the “smaller trials” belong to the same stratum. This reduces variability in group comparisons if the stratification is used in the analysis. Reduction in variability allows a study of a given size to detect smaller group differences in response variables or to detect a specified difference with fewer participants [22, 26].

Sometimes the variables initially thought to be most prognostic and, therefore used in the stratified randomization, turn out to be unimportant. Other factors may be identified later which, for the particular study, are of more importance. If randomization is done without stratification, then analysis can take into account those factors of interest and will not be complicated by factors thought to be important at the time of randomization. It has been argued that there usually does not exist a need to stratify at randomization because stratification at the time of analysis will achieve nearly the same expected power [7]. This issue of stratifying pre- versus post-randomization has been widely discussed [3538, 42]. It appears for a large study that stratification after randomization provides nearly equal efficiency to stratification before randomization [39, 40]. However, for studies of 100 participants or fewer, stratifying the randomization using two or three prognostic factors may achieve greater power, although the increase may not be large.

Stratified randomization is not the complete solution to all potential problems of baseline imbalance. Another strategy for small studies with many prognostic factors is considered below in the section on adaptive randomization.

In multicenter trials, centers vary with respect to the type of participants randomized as well as the quality and type of care given to participants during follow-up. Thus, the center may be an important factor related to participant outcome, and the randomization process should be stratified accordingly [33]. Each center then represents, in a sense, a replication of the trial, though the number of participants within a center is not adequate to answer the primary question. Nevertheless, results at individual centers can be compared to see if trends are consistent with overall results. Another reason for stratification by center is that if a center should have to leave the study, the balance in prognostic factors in other centers would not be affected.

One further point might need consideration. If in the stratified randomization, a specific proportion or quota is intended for each stratum, the recruitment of eligible participants might not occur at the same rate. That is, one stratum might meet the target before the others. If a target proportion is intended, then plans need to be in place to close down recruitment for that stratum, allowing the others to be completed.

Adaptive Randomization Procedures

The randomization procedures described in the sections on fixed allocation above are non-adaptive strategies. In contrast, adaptive procedures change the allocation probabilities as enrollment progresses. Two types of adaptive procedures will be considered here. First, we will discuss methods which adjust or adapt the allocation probabilities according to imbalances in numbers of participants or in baseline characteristics between the two groups. Second, we will briefly review adaptive procedures that adjust allocation probabilities according to the responses of participants to the assigned intervention.

Baseline Adaptive Randomization Procedures

Two common methods for adaptive allocation which are designed to make the number of participants in each study group equal or nearly equal are biased coin randomization and urn randomization. Both make adaptations based only on the number of participants in each group, though they can be modified to perform allocation within strata in the same way as blocked randomization, and operate by changing the allocation probability over time.

The Biased Coin Randomization procedure, originally discussed by Efron [46], attempts to balance the number of participants in each treatment group based on the previous assignments, but does not take participant responses into consideration. Several variations to this approach have been discussed [4763]. The purpose of the algorithm is basically to randomize the allocation of participants to groups A and B with equal probability as long as the number of participants in each group is equal or nearly equal. If an imbalance occurs and the difference in the number of participants is greater than some prespecified value, the allocation probability (p) is adjusted so that it is higher for the group with fewer participants. The investigator can determine the value of the allocation probability. The larger the value of p, the faster the imbalance will be corrected, while the nearer p is to 0.5, the slower the correction. Efron suggests an allocation probability of p = 2/3 when a correction is indicated. Since much of the time p is greater than 1/2, the process has been named the “biased coin” method. As a simple example, suppose nA and nB represent the number of participants in groups A and B respectively. If nA is less than nB and the difference exceeds a predetermined value, D, then we allocate the next participant to group A with probability p = 2/3. If nA is greater than nB by an amount of D, we allocate to group B with probability p = 2/3. Otherwise, p is set at 0.50. This procedure can be modified to include consideration of the number of consecutive assignments to the same group and the length of such a run. Some procedures for which the allocation probability also depend on differences in baseline characteristics, as discussed below, are sometimes also called “biased coin” designs.

Another similar adaptive randomization method is referred to as the Urn Design, based on the work of Wei and colleagues [6467]. This method also attempts to keep the number of participants randomized to each group reasonably balanced as the trial progresses. The name Urn Design refers to the conceptual process of randomization. Imagine an urn filled with m red balls and m black balls. If a red ball is drawn at random, assign the participant to group A, return the red ball, and add one (or more than one) black ball to the urn. If a black ball is drawn, assign the participant to group B, return that ball, and add one (or more than one) red ball to the urn. This process will tend to keep the number of participants in each group reasonably close because, like the biased coin procedure it adjusts the allocation probability to be higher for the smaller group. How much imbalance there might be over time depends on m and how many balls are added after each draw.

Since the biased coin and urn procedures are less restrictive than block randomization, they can be less susceptible to selection bias, but by the same token they do not control balance as closely. If there are temporal trends in the recruitment pool during enrollment, imbalances can create difficulties. This happened in the Stop Atherosclerosis in Native Diabetics Study (SANDS), a trial comparing intensive intervention for cholesterol and blood pressure with less intensive intervention in people with diabetes [68, 69]. Randomization was done using a stratified urn design, but partway through the trial there was in imbalance in the intervention groups at the same time new and more aggressive guidelines regarding lipid lowering treatment in people who had known coronary heart disease came out. The participants in SANDS who met those guidelines could no longer be treated with the less intensive regimen and no new participants with a history of prior cardiovascular events could be enrolled. Not only was there a possibility of imbalance between study groups, the sample size needed to be reconsidered because of the lower average risk level of the participants.

The most correct analysis of a randomized trial from a theoretical point of view is based on permutation distributions modeling the randomization process. For adaptive procedures this requires that the significance level for the test statistic be determined by considering all possible sequences of assignments which could have been made in repeated experiments using the same allocation rule, assuming no group differences. How well population models approximate the permutation distribution for adaptive designs in general is not well understood [6, 14, 70]. Efron [46] argues that it is probably not necessary to take the biased coin randomization into account in the analysis, especially for larger studies. Mehta and colleagues [71] compared analyses ignoring and incorporating biased coin and urn procedures and concluded that the permutation distribution should not be ignored. Smythe and Wei [30, 46] and Wei and Lachin [46, 66] indicate conditions under which test statistics from urn designs are asymptotically normal, and show that if this randomization method is used, but ignored in the analyses, the p-value will be slightly conservative, that is, slightly larger than if the strictly correct analysis were done. Thus the situation for analysis of biased coin and urn designs is similar to that for permuted block designs. Ignoring the randomization is conservative, though not likely to be excessively conservative. Unlike the permuted block design, however, strong temporal trends can create problems for adaptive randomization, and make the permutation-based analysis more important. Although the biased coin method does not appear to be as widely used, stratified urn procedures have been used successfully, as in the multicenter Diabetes Control and Complication Trial [72, 73].

Minimization

In the Enforcing Underage Drinking Laws (EUDL) randomized community trial, 68 communities in five states were selected to receive either an intervention or a control condition. Matched pairs were created using community characteristics including population size, median family income, percentage of the population currently in college, and percentages that were black, Hispanic and spoke Spanish. The specific set of pairings used was determined by sampling from all possible pairings and selecting the set of pairs with the smallest Mahalanobis distance measure. One community in each pair was then randomly assigned to receive the intervention [74]. In this situation, all the communities to be randomized and the key prognostic covariates are known in advance. The treatment and control groups are guaranteed to be well-balanced, and randomization provides a foundation for later statistical inference using standard population models. This type of a priori matching is a common feature of group-randomized trials [75].

Unfortunately, this is almost never possible in a clinical setting, where patients typically arrive sequentially and must be treated immediately. To accommodate the sequential nature of participant enrollment, some compromise between manipulation of allocation to achieve balance of prognostic covariates and a less restrictive treatment allocation must be made. Stratified block designs can balance a small number of selected prognostic covariates, and randomization will tend to balance unselected as well as unmeasured covariates, but such methods do not perform well when it is important to balance a large number of prognostic covariates in a small sample. For such settings, procedures which adapt allocation to achieve balance on prognostic covariates have been developed.

The biased coin and urn procedures achieve balance in the number of randomizations to each arm. Other stratification methods are adaptive in the sense that intervention assignment probabilities for a participant are a function of the distribution of baseline covariates for participants already randomized. This concept was suggested by Efron [46] as an extension of the biased coin method and also has been discussed in depth by Pocock and Simon [41], and others [47, 48, 51, 52, 59, 63, 76, 77]. In a simple example, if age is a prognostic factor and one study group has more older participants than the other, this allocation scheme is more likely to randomize the next several older participants to the group which currently has more younger participants. Various methods can be used as the measure of imbalance in prognostic factors. In general, adaptive stratification methods incorporate several prognostic factors in making an “overall assessment” of the group balance or lack of balance. Participants are then assigned to a group in a manner which will tend to correct an existing imbalance or cause the least imbalance in prognostic factors. Proschan and colleagues [70] distinguish between minimization procedures which are deterministic [59, 68], as ‘strict minimization’, reserving the term mimimization for the more general procedure described by Pocock and Simon [41] [see Appendix]. Generalization of this strategy exists for more than two study groups. Development of these methods was motivated in part by the previously described problems with non-adaptive stratified randomization for small studies. Adaptive methods do not have empty or near empty strata because randomization does not take place within a stratum although prognostic factors are used. Minimization gives unbiased estimates of treatment effect and slightly increased power relative to stratified randomization [68]. These methods are being used, especially in clinical trials of cancer where several prognostic factors need to be balanced, and the sample size is typically 100–200 participants.

The major advantage of this procedure is that it protects against a severe baseline imbalance for important prognostic factors. Overall marginal balance is maintained in the intervention groups with respect to a large number of prognostic factors. One disadvantage is that minimization is operationally more difficult to carry out, especially if a large number of factors are considered. Although White and Freedman [63] initially developed a simplified version of the minimization method by using a set of specially arranged index cards, today any small programmable computer can easily carry out the calculations. Unlike blocked, biased coin and urn procedures, however, the calculations for minimization cannot be done in advance. In addition, the population recruited needs to be stable over time, just as for other adaptive methods. For example, if treatment guidelines change during a long recruitment period, necessitating a change in the inclusion or exclusion criteria, the adaptive procedure may not be able to correct imbalances that developed beforehand, as with the SANDS example cited above.

For minimization, assuming that the order of participant enrollment is random and applying the allocation algorithm to all permutations or the order can provide a null distribution for the test statistic [14, 70]. Considerable programming and computing resources are required to do this, and biostatisticians prefer to use conventional tests and critical values to determine significance levels. Unfortunately, for minimization there are no general theoretical results on how well the standard analysis approximates the permutation analysis [6, 14, 70], though there are some simulation-based results for specific cases [78].

General advice for stratified block randomization and minimization is to include the baseline variables used to determine the allocation as covariates in the analysis [51, 79]. This seems to produce reliable results in most actual trials using stratified block randomization, and in most trials using minimization, though trials using minimization designs rarely examine the permutation distribution. Proschan et al. [70] however, report an example of an actual trial using minimization for which conventional analysis greatly overstated the significance of the intervention effect relative when compared to the permutation distribution. The use of unequal allocation contributed to the discrepancy in this case, but the Proschan et al. recommend that the permutation test be used to control type 1 error whenever allocation is done using minimization. Several regulatory guidelines make the similar recommendations [8083].

Despite the appeal of improved balance on more prognostic covariates, most biostatisticians approach minimization and other dynamic allocation plans with caution. As conditions vary considerably from trial to trial, it is expected that the best choice for method of allocation also varies, with the primary goal of avoiding a method which is poorly suited for the given situation.

Response Adaptive Randomization

Response adaptive randomization uses information on participant response to intervention during the course of the trial to determine the allocation of the next participant. Examples of response adaptive randomization models are the Play the Winner [84] and the Two-Armed Bandit [85] models. These models assume that the investigator is randomizing participants to one of two interventions and that the primary response variable can be determined quickly relative to the total length of the study. Bailar [86] and Simon [87] reviewed the uses of these allocation methods. Additional modifications or methods were developed [8894].

The Play the Winner procedure may assign the first participant by the toss of a coin. The next participant is assigned to the same group as the first participant if the response to the intervention was a success; otherwise, the participant is assigned to the other group. That is, the process calls for staying with the winner until a failure occurs and then switching. The following example illustrates a possible randomization scheme where S indicates intervention success and F indicates failure:

Assignment

Participant

1

2

3

4

5

6

7

8

Group A

S

F

   

S

F

  

Group B

  

S

S

F

  

S

 

Another response adaptive randomization procedure is the Two Armed Bandit method which continually updates the probability of success as soon as the outcome for each participant is known. That information is used to adjust the probabilities of being assigned to either group in such a way that a higher proportion of future participants would receive the currently “better” or more successful intervention.

Both of these response adaptive randomization methods have the intended purpose of maximizing the number of participants on the “superior” intervention. They were developed in response to ethical concerns expressed by some clinical investigators about the randomization process. Although these methods do maximize the number of participants on the “superior” intervention, the possible imbalance will almost certainly result in some loss of power and require more participants to be enrolled into the study than would a fixed allocation with equal assignment probability [92]. A major limitation is that many clinical trials do not have an immediately occurring response variable. They also may have several response variables of interest with no single outcome easily identified as being the one upon which randomization should be based. Furthermore, these methods assume that the population from which the participants are drawn is stable over time. If the nature of the study population should change and this is not accounted for in the analysis, the reported significance levels could be biased, perhaps severely [93]. Here, as before, the data analysis should ideally take into account the randomization process employed. For response adaptive methods, that analysis will be more complicated than it would be with simple randomization. Because of these disadvantages, response adaptive procedures are not commonly used.

One application of response adaptive allocation can be found in a trial evaluating extra-corporeal membrane oxygenator (ECMO) in a neonatal population suffering from respiratory insufficiency [9599]. This device oxygenates the blood to compensate for the inability or inefficiency of the lungs to achieve this task. In this trial, the first infant was allocated randomly to control therapy. The result was a failure. The next infant received ECMO which was successful. The next ten infants were also allocated to ECMO and all outcomes were successful. The trial was then stopped. However, the first infant was much sicker than the ECMO-treated infants. Controversy ensued and the benefits of ECMO remain unclear. This experience does not offer encouragement to use this adaptive randomization methodology.

Mechanics of Randomization

The manner in which the chosen randomization method is actually implemented is very important [100]. If this aspect of randomization does not receive careful attention, the entire randomization process can easily be compromised, thus voiding any of the advantages for using it. To accomplish a valid randomization, it is recommended that an independent central unit be responsible for developing the randomization process and making the assignments of participants to the appropriate group [27, 101]. For a single center trial, this central unit might be a biostatistician or clinician not involved with the care of the participants. In the case of a multicenter trial, the randomization process is usually handled by the data coordinating center. Ultimately, however, the integrity of the randomization process will rest with the investigator.

Chalmers and colleagues [102] reviewed the randomization process in 102 clinical trials, 57 where the randomization was unknown to the investigator and 45 where it was known. The authors reported that in 14% of the 57 studies, at least one baseline variable was not balanced between the two groups. For the studies with known randomization schedules, twice as many, or 26.7%, had at least one prognostic variable maldistributed. For 43 non-randomized studies, such imbalances occurred four times as often or in 58%. The authors emphasized that those recruiting and entering participants into a trial should not be aware of the next intervention assignment.

In many cases when a fixed proportion randomization process is used, the randomization schedules are made before the study begins [103107]. The investigators may call a central location, and the person at that location looks up the assignment for the next participant [103]. Another possibility, used historically and still sometimes in trials involving acutely ill participants, is to have a scheme making available sequenced and sealed envelopes containing the assignments [106]. As a participant enters the trial, she receives the next envelope in the sequence, which gives her the assignment. Envelope systems, however, are more prone to errors and tampering than the former method [27, 101]. In one study, personnel in a clinic opened the envelopes and arranged the assignments to fit their own preferences, accommodating friends and relatives entering the trial. In another case, an envelope fell to the bottom of the box containing the envelopes, thus changing the sequence in which they were opened. Many studies prefer web-based or telephone systems to protect against this problem. In an alternative procedure that has been used in several double-blind drug studies, medication bottles are numbered with a small perforated tab [105]. The bottles are distributed to participant in sequence. The tab, which is coded to identify the contents, is torn off and sent to the central unit. This system is also subject to abuse unless an independent person is responsible for dispensing the bottles. Many clinical trials using a fixed proportion randomization schedule require that the investigator access a website or call the central location to verify that a participant is eligible to be in the trial before any assignment is made. This increases the likelihood that only eligible participants will be randomized.

For many trials, especially multicenter and multinational trials, logistics require a central randomization operations process. Web-based approaches to randomization and other aspects of trial management predominate now [108]. In some cases, the clinic may register a participant by dialing into a central computer and entering data via touchtone, with a voice response. These systems, referred to as Interactive Voice Response Systems or IVRS, or Interactive Web Response Systems, IWRS, are effective and can be used to not only assign intervention but can also capture basic eligibility data. Before intervention is assigned, baseline data can be checked to determine eligibility. This concept has been used in a pediatric cancer cooperative clinical trial network [109] and in major multicenter trials [110, 111].

Whatever system is chosen to communicate the intervention assignment to the investigator or the clinic, the intervention assignment should be given as closely as possible to the moment when both investigator and participant are ready to begin the intervention. If the randomization takes place when the participant is first identified and the participant withdraws or dies before the intervention actually begins, a number of participants will be randomized before being actively involved in the study. An example of this occurred in a non-blinded trial of alprenolol in survivors of an acute myocardial infarction [112]. In that trial, 393 participants with a suspected myocardial infarction were randomized into the trial at the time of their admission to the coronary care unit. The alprenolol or placebo was not initiated until 2 weeks later. Afterwards, 231 of the randomized participants were excluded because a myocardial infarction could not be documented, death had occurred before therapy was begun, or various contraindications to therapy were noted. Of the 162 participants who remained, 69 were in the alprenolol group and 93 were in the placebo group. This imbalance raised concerns over the comparability of the two groups and possible bias in reasons for participant exclusion. By delaying the randomization until initiation of therapy, the problem of these withdrawals could have been avoided.

Problems of implementation can also affect the integrity of the randomization procedure. Downs and colleagues [101] relate their experiences with problems caused by errors in programming, incomplete and missing data for stratification variables, and other problems. They also recommend testing of the proposed procedure before the trial begins, and monitoring of the allocation after it begins.

Recommendations

For large studies involving more than several hundred participants, the randomization should be blocked. If a large multicenter trial is being conducted, randomization should be stratified by center. Randomization stratified on the basis of other factors in large studies is usually not necessary, because randomization tends to make the study groups quite comparable for all risk factors. The participants can still, of course, be stratified once the data have been collected and the study can be analyzed accordingly.

For small studies, the randomization should also be blocked, and stratified by center if more than one center is involved. Since the sample size is small, a few strata for important risk factors may be defined to assure that balance will be achieved for at least those factors. For a larger number of prognostic factors, the adaptive stratification techniques should be considered and the appropriate analyses performed. As in large studies, stratified analysis can be performed even if stratified randomization was not done. For many situations, this will be satisfactory.