Introduction

A good randomized controlled trial is needed to support the correct prescription of a treatment or a medical procedure to a patient [1]. However, in operative dentistry, as in many other surgical specialties like orthopedics, many technical difficulties complicate the design and conduct of clinical trials [2, 3]. Operative protocols are often long, and therefore expensive, making it difficult to include many patients; operative parameters (such as the instruments used, the operator himself, etc.) can vary, and it is difficult to carry out a double-blind trial, as the operator generally cannot be blinded during the procedure.

When a clinical trial cannot be blinded, the protocol and randomization have to be faultless and well documented [4]. Randomization should be carried out as late as possible so that treatment knowledge does not influence the operator’s actions upstream [5]. It should take into account low recruitment, absence of blinding, and operative parameters. When there is limited recruitment, the first aim of the randomization is to obtain a good balance between the treatment or procedure groups being compared [6] in order to optimize estimation of treatment effect and power. Secondly, to overcome absence of blinding, the allocations should not be predictable [7]. Lastly, the parameters which may influence the treatment effect estimate (which are also called "prognostic factors") should be considered. Only a limited number of main prognostic factors should be accounted for at the time of randomization [813]. If these factors are well distributed between groups, it is possible to attribute the effect observed to the evaluated treatment or procedure rather than to these factors [14, 15]. Taking the main prognostic factors into account is especially important when few patients are included (<200 patients per trial arm) [16, 17], when the trial is open-label, when subgroup or intermediate analyses are planned, or when the trial is aiming to demonstrate equivalence [1820].

Most dental trials fall within the above-mentioned types of trials and therefore, stratified blocked randomization or minimization should be implemented; these are the two traditional techniques which achieve the randomization aims described above [21]. Other techniques have been described, but they are more sophisticated and thus more difficult to implement (e.g., Efron’s biased coin design [22], Wei’s urn design [23], Soares and Wu’s big stick design [24], Signorini’s dynamic balanced allocation [25]).

Stratified blocked randomization consists of generating blocks of treatment allocation (e.g., a block of 4: "ABBA", meaning the first patient receives treatment A, the second treatment B, etc.). Blocks can be of varying size, but one block contains an equal number of treatments A and B in order to achieve balance between groups. The order of treatments within a block is randomly generated. A randomization list (which is a block series) is generated for each stratum of patients which contains patients whose prognostic factors are all identical [21]. Stratification is the procedure recommended by regulatory bodies [15]. It works quite well when only a few prognostic factors must be taken into account.

Minimization is a dynamic method that minimizes the imbalance between the number of patients in each treatment group over a number of prognostic factors [26]. The treatment allocated to the next participant depends on the characteristics of the patients already enrolled [5]. Minimization can take into account many factors [27], but it is not recommended by the regulatory bodies. The first method described (by Taves [28] and by Pocock and Simon [29]) was completely deterministic. Thus, the method was considered to be a potential source of bias since operators might be able to predict all the allocations, especially in mono-center and industry-supported trials. Although some authors have proposed introducing some randomness to the minimization [29, 30], authorities still require the use of this method to be justified [10, 15].

Different suggestions have been put forward to help in the effective choice between the two methods. For minimization, the proportion of random allocations recommended ranges from 5 to 30 % depending on the author or the recommendation [30, 31]. It has been generally considered that minimization could easily deal with 10 prognostic factors [27] and Rovers et al. stated that the expected number of patients in each subcategory should be greater than five, to prevent empty cells [17]. For stratification, two recommendations have defined the maximum number of strata: Therneau suggests that the number of strata be less than half the sample size [27], while Kernan et al. suggests keeping the number of strata S below N/(B × 4), where N is the sample size and B is the block size [20]. However, the choice of the block size and the proportion of random allocations included depend on other clinical trial parameters. Statisticians have therefore suggested using computer simulations to choose the allocation method best suited for the proposed trial [32, 33].

Some authors have used simulated data to compare both methods in terms of balance between treatment arms [27, 29], balance within a factor and within strata [25, 34], and statistical analysis (e.g., performance of conventional model-based statistical inference [33], estimated treatment effect, size of the rejection region, and power [35]). Recently, Zhao et al. compared many designs in terms of imbalance and correct guess probability, but stratification and minimization were not among the designs compared [36]. Real clinical data have also been used to conduct a posteriori simulations and compare randomization methods in terms of balance [37], statistical power [32], and nominal significance level [38]. Brown et al. compared deterministic minimization to minimization incorporating various random elements in terms of prediction rates and balance [30]. However, comparisons between stratified blocked randomization and minimization in terms of predictability and balance are missing.

The aim of this study and of the HERMES software we created was to compare stratification methods (with various block sizes) and minimization methods (including more or less randomness), to analyze the effect of various parameters of clinical trials on this choice (sample size, number of prognostic factors, and operators), and to therefore, provide guidance for future investigators. To compare the methods, we computed an indicator of balance between treatments and an indicator of predictability for operators. Specifically, we wanted to choose the optimal method to randomize 358 patients between two groups in a trial comparing ceramic and composite to make inlays/onlays (CECOIA trial, NCT01724827, www.cecoia.fr), while taking into account four prognostic factors (pulp vitality, inlay or onlay, premolar or molar, and operator). We wrote an initial computer simulation program that answered this question. We then wrote a second program (HERMES)Footnote 1 that can be applied to other studies, in order to help future clinical trial investigators choose the most suitable randomization method.

Materials and methods

A computer simulation program was coded with Visual Basic for Applications and Excel software. It allocated one of two treatments to patients simulated according to their expected characteristics. Various randomization methods could then be compared in terms of balance between groups and predictability for the operator.

The "Simulation" tab of the Excel interface allowed us to enter the following clinical trial parameters: number of patients to be included; number of simulations to be performed; number of prognostic factors to take into account, associated with the number of levels or values that could be taken by each factor; proportion of patients expected in each prognostic level; number of operators; and the parameters of the allocation methods to be tested.

In order to clarify the minimization and stratification, consider the following simple example of a trial including 10 patients. Two factors were identified: factor 1, with levels, a and b (e.g., for a surgical trial this could be "smoker" or "non-smoker"); and factor 2, with three values, a', b', and c' (e.g., the three operators in the trial). The study investigators were expecting 50 % a and 50 % b for factor 1, and 20 % a', 30 % b', and 50 % c' for factor 2. The software randomly generated the patients listed in Table 1, taking into account these expected proportions.

Table 1 Characteristics of the 10 patients randomly generated by the program and treatments allocated to them by the various randomization methods, final numbers, and balance indicators

Minimization was coded as described by Pocock and Simon [29]. In our example, the allocation of the first patient was completely random. Suppose treatment A was assigned. The numbers then obtained for treatments A and B are shown in Fig. 1. If treatment A was assigned to the second patient, the sum of squared imbalances would be 22 + 12 + 12 = 6, whereas it would be 12 + 12 = 2 if treatment B was assigned (see Fig. 1). Therefore treatment B, which minimized the sum of squares, was assigned to the second patient by minimization.Footnote 2 This is a fully-deterministic minimization, where allocations depend solely on the characteristics of already-included patients. The allocations for the 10 patients are listed in Table 1.

Fig. 1
figure 1

Treatment allocation to patient 2 by minimization. Numbers after inclusion of patient 1 (a, b'), Numbers if treatment A is allocated to patient 2 (a, c'), Numbers if treatment B is allocated to patient 2 (a, c')

It is also possible to introduce some randomness so that the minimization is less predictable. Thus, according to the percentage X ∈ [0−50 %] of randomness chosen, the treatment allocated will be the one dictated by minimization in (100 − X)% of cases, and the other treatment will be allocated in X % of cases. This was programmed in a simple way by generating a random number U according to the uniform distribution on [0, 1]. The allocation was then made depending on UX or U > X. For example, in the case of randomization with 30 % randomness (X = 30 %), B was the treatment that minimized the sum of squared imbalances for patient 5. The random number generated was 0.23. As this is less than 0.3, treatment A was finally assigned (see Table 1). Finally, it was also possible to introduce the treatment (A or B) as a factor in the minimization, in order to keep a better balance overall.

For the stratified blocked randomization, a blocked randomization list was generated for each stratum. The strata number was equal to the product of the number of levels of each factor (here, 2 levels for factor 1 × 3 levels for factor 2 = 6 strata). Initially, one block was generated by the software for each stratum (see Table 2). Then, a new block was generated for all strata each time a randomization list had been entirely allocated. Take the example of randomization in blocks of two: the program assigned the first patient (a, b') the first treatment on the corresponding randomization list (a, b'), here treatment B (see Table 2), and so on (see results in Table 1). For this 10-patient trial, it was not necessary to generate additional blocks. Similarly for randomization in blocks of four, the first patient (a, b') was assigned treatment A (see Table 2), and the following allocations are shown in Table 1.

Table 2 Randomization lists generated for each stratum for the stratified randomization in blocks of 2 (upper side) and blocks of 4 (lower side)

Indicators used

For each set of patients, an imbalance indicator was calculated, which corresponded to the absolute value of the difference in numbers between treatments A and B, divided by the number of patients included (to allow comparison of the balance of one trial with another). The imbalance indicators of the various randomization methods, using our example, are listed in Table 1. An indicator of the within-factor imbalance was computed; it was the mean imbalance within prognostic factor levels. Each prognostic level imbalance was the absolute value of the difference in numbers between treatments A and B, divided by the number of patients included in the prognostic level.

Predictability was computed by operator (or center). We adopted four different methods to mimic how operators may predict treatment allocation: 1, prediction based upon knowledge of the last allocation only; 3, the last 3; 5, the last 5; or all, the operator had written down or had access to all his allocations [30, 40, 41]. Consider our example again, with a', b', and c' indicating the three operators, who have remembered only their last allocations, which were generated by deterministic minimization. Predictability was calculated from the second inclusion. The operator predicted that his new patient would receive the treatment other than the previous allocated one. Thus operator b' predicted his second patient would receive treatment B, since his first had received treatment A (see Table 3). His second patient did receive treatment B. However, he predicted his third patient would receive treatment A whereas he was allocated treatment B, and so on.

Table 3 Calculation of the predictability indicator if the operator remembered his last allocation only

The predictability indicator was calculated as the sum of cases where operators correctly guessed the treatment allocated to their patients, divided by the number of guesses. For our example, predictability indicators of each allocation method are shown in Table 3. If the operator remembered 3, 5, or all of his inclusions, he predicted his next patient would receive the treatment that was least affected among the latter.

The number of simulations was the number of samples (like the one in Table 1) simulated. Several minimization and/or stratification methods can be used on each sample. The simulations were activated by the "Launch" button on the "Simulation" tab of the Excel interface. The mean and standard deviation of the indicators of balance and predictability were calculated and these results appeared after a few seconds on the "Results" tab of the Excel interface.

Results

The software developed provides access to the predictability and balance of a given randomization method. To present the results, we chose a reference situation (Trial 0), which we varied depending on the number of patients included, the number of selected prognostic factors, the number of operators, and the distribution of subjects within factors (Table 4). We performed 10,000 simulations for each trial. (Appendix 1).

Table 4 Characteristics of simulated trials (patients, factors, frequencies, operators, and strata)

Trial 0

Predictability and balance results for Trial 0, according to the method of randomization selected, are shown in Table 5. Overall imbalance and within-factor imbalance increased when minimization included more randomness (i.e., when X increased), and when the block size increased for stratification. Predictability decreased as minimization included more random allocations or as the block size increased for stratification. Predictability increased when the operator remembered more allocations and it was maximal if he had written down all his allocations (Table 5). We then considered the situation of the operator remembering his last five allocations, because this seemed to be the most likely situation to occur in real-life practice [40].

Table 5 Imbalance and predictability indicators of Trial 0 for various randomization methods

Influence of parameter variation on predictability and imbalance in trial 0

Sample size

The graphs in Fig. 2 show the effect of the number of patients. For a given randomization method, when the number of patients included increased, the imbalance decreased, as did predictability. For a small trial (cf. Trial 1), the properties of the various methods differed greatly.

Fig. 2
figure 2

Effect of change in sample size on imbalance and predictability indicators of various randomization methods (operator remembered his last five allocations)

Number of prognostic factors

The graphs in Fig. 3 show the effect of the number of prognostic factors. For stratification, the imbalance increased greatly when the number of prognostic factors increased, as the number of strata increased more quickly. However, imbalance increased slightly for minimization as its predictability decreased.

Fig. 3
figure 3

Effect of the number of prognostic factors on imbalance and predictability indicators of various randomization methods (operator remembered his last five allocations)

Number of operators

The graphs in Fig. 4 show the effect of the number of operators. Imbalance increased with the number of operators. An increasing number of operators had a negative impact on both methods, but affected stratification more.

Fig. 4
figure 4

Effect of the number of operators on imbalance and predictability indicators of various randomization methods (operator remembered his last five allocations)

Subject distribution between prognostic factors

The graphs in Fig. 5 show the effect of a more or less unequal distribution of subjects between prognostic levels. Unequal distribution of subjects favored minimization (cf. Trial 0 vs. Trial 7).

Fig. 5
figure 5

Effect of subject distribution between classes/values of prognostic factors on imbalance and predictability indicators of various randomization methods (operator remembered his last five allocations

Note that including treatment as a minimization factor almost always improved the two indicators (balance and predictability) simultaneously, except when there was only one operator (Trial 5) or when the number of subjects was poorly distributed between operators (Trial 8). Finally, note that in our examples, imbalance was always minimal for a deterministic minimization including treatment as a minimization factor (Table 5, Figs. 2, 3, 4, and 5).

Discussion

In this study, we applied computer simulation of stratification and minimization randomization techniques to a balanced two-arm study model, with the aim of discerning which technique was most appropriate. We found that there was remarkable adaptability in the minimization, and inverse correlation between predictability and balance.

Choosing the right randomization method is important because it will affect the results of the clinical trial in terms of balance between groups, of predictability for the operator, and in terms of statistical analysis. Some rules, such as Kernan’s rule [20], already exist to help investigators make that choice. However, we saw, for example on Fig. 5, that Trial 0 was more favorable to stratification than Trial 8, although Kernan’s index was the same for both methods (3.7, Table 4). These rules, thus, lack subtlety in the choice of the randomization method because many other parameters can vary within a trial and influence the decision (e.g., the number of subjects, the number of prognostic factors to consider, the number of operators, and the distribution of subjects between factor levels). For this reason, statisticians have suggested performing simulations. However, up until now, investigators have not had freely-available simulation tools; this situation prompted us to create the HERMES software.

The first main result emerging from our study is that minimization was more adaptable and worked better in complex cases. This result confirms those of previous studies [9, 27, 35, 37, 38]. When the number of patients decreased or when the number of prognostic factors or the number of operators increased, the imbalance of stratification increased much more than that of minimization.

The second main result is that there was generally an inverse correlation between predictability and imbalance: when a less predictable method was wanted, its imbalance increased and vice versa. This idea has been mentioned or suggested by some authors [33, 38] but not as clearly demonstrated [30] until recently [36]. A trade-off between predictability and imbalance is therefore necessary as the perfect method (i.e., 0 % imbalance and 50 % predictability) does not exist.

This trade-off highlights a limitation of our study: although the software computed the predictability and imbalance of various methods, choosing the best method would still often be difficult. The imbalance–predictability graphs are helpful because various methods can be compared in terms of these two properties simultaneously; a method whose marker is located at the lower left of another is better. However, choosing the best method is less obvious. Analysis approaching that of receiver operating characteristic curves would make us choose the method closest to the origin of the graph [42]. However, the graph is modified by the scale attributed to imbalance and predictability. Ultimately, the choice of the best method on the graph will depend on the trial; predictability is a less fundamental criterion for a double-blind trial than it is for an open-label trial, whereas balance will be less critical if a large difference in effect size between treatments or procedures being compared is expected.

For example, for Trial 0, in the case of a double-blind trial, we may decide to perform a deterministic minimization with the treatment as a minimization factor. If it were an open-label trial, we would prefer a minimization with a random factor of X = 10 or 20 % or even more. Note that for this trial, stratification by blocks of two was not far below these minimizations. We could therefore, decide in this case to adopt block stratification because it provides several advantages: it is easier to implement [20, 43] (the sequence can be generated in advance [44]), it is recommended by the authorities, and it allows all interactions that may exist between factors to be taken into account. This latter factor weakens it when many factors must be considered, because some intersections have very few patients. However, it is also a strong point if interactions between prognostic factors are suspected or if subgroup analyzes are planned [34].

The generalizability of our program is limited in two ways. Firstly, it does not take into account possible interactions between prognostic factors at the time of patient simulation. However, these interactions can rarely be quantified before the start of a dental trial. If significant interactions between factors are suspected, the results of stratification should be compared to those of a minimization, taking into account these interactions. For example, if an interaction is suspected between factor 1 with 2 levels (a and b) and factor 2 with 3 levels (a', b', and c'), patients can be simulated (as if the interaction existed) by 6 levels of minimization (aa', ab', ac', ba', bb', and bc'), instead of by the 5 levels which a minimization without interaction would have included (a, b, a', b', and c').

Secondly, we restricted the comparison of the different randomization methods to the case of a trial with two balanced arms. However, we wrote the program in Visual Basic for Applications so that the code is accessible and can be changed, if necessary, to adapt it to a wide variety of clinical trials.

A final limitation of our work is that it compared different methods of stratification and minimization on criteria of balance and predictability, but not on the statistical analysis of results. However, to do so would require making assumptions about trial outcomes. This would not be straightforward in our field, and we believe that existing data are preferable for such comparisons. Conclusions regarding the consequences of the randomization method on the statistical analysis obtained on real datasets can be found in Appendix 2.

In conclusion, the HERMES software does compare stratification and minimization in terms of predictability and balance, but it does not entirely solve the choice of the most suitable method for a trial. The right compromise between predictability and imbalance remains to be found, but the software helps to justify this choice based on concrete reasoning. It is available for free download at this internet address: chabouis.fr/helene/hermes.