An urn model to construct an efficient test procedure for response adaptive designs

Ghiglietti, Andrea; Paganoni, Anna Maria

doi:10.1007/s10260-015-0314-y

An urn model to construct an efficient test procedure for response adaptive designs

Published: 21 May 2015

Volume 25, pages 211–226, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Methods & Applications Aims and scope Submit manuscript

An urn model to construct an efficient test procedure for response adaptive designs

Download PDF

183 Accesses
4 Citations
Explore all metrics

Abstract

We study the statistical performance of different tests for comparing the mean effect of two treatments. Given a reference classical test ${\mathcal {T}}_0$, we determine which sample size and proportion allocation guarantee to a test ${\mathcal {T}}$, based on response-adaptive design, to be better than ${\mathcal {T}}_0$, in terms of (a) higher power and (b) fewer subjects assigned to the inferior treatment. The adoption of a response-adaptive design to implement the random allocation procedure is necessary to ensure that both (a) and (b) are satisfied. In particular, we propose to use a Modified Randomly Reinforced Urn design and we show how to perform the model parameters selection for the purpose of this paper. Then, the opportunity of relaxing some assumptions on treatment response distributions is presented. Results of simulation studies on the test performance are reported and a real case study is analyzed.

Asymptotic Properties of an Adaptive Randomly Reinforced Urn Model

Fixed-width confidence interval for covariate-adjusted response-adaptive designs

Article 21 January 2017

Hypothesis testing in adaptively sampled data: ART to maximize power beyond iid sampling

Article 02 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we focus on the statistical performances of an hypothesis test that compares the means of the responses to two treatments. The procedure studied in the paper is illustrated within the framework of clinical trials. However, the generality of the mathematical setting would allow the method to be applied to a wider set of applications. So, we consider a clinical trial to compare the mean effect of two competing treatments, say $R$ and $W$. We consider a classical test ${\mathcal {T}}_0=(p_0,n_0)$ that involves $n_0$ patients with a fixed proportion $p_0$ of subjects allocated to treatment $R$. In Sect. 2 we consider a test ${\mathcal {T}}$ based on a response-adaptive design with different sample size and proportion allocation and we make a comparison among their statistical performances. In particular, the analysis aims at determining which characteristics guarantees a test ${\mathcal {T}}$ to perform better than ${\mathcal {T}}_0$, in terms of (a) higher power and (b) fewer subjects assigned to the inferior treatment. Response-adaptive designs, in a clinical setting, are very attractive since they aim at achieving simultaneously two different goals, concerning both statistical and ethical purposes: (i) collecting evidence to determine the superior treatment, and (ii) minimizing the number of subjects allocated to the inferior treatment. For a complete literature review on response-adaptive designs see Lachin and Rosenberger (2002), Hu and Rosenberger (2006), Flournoy et al. (2012), Atkinson and Biswas (2014). The adaptive procedure we propose is the Modified Randomly Reinforced Urn (MRRU) design introduced in Aletti et al. (2013). A wide class of response-adaptive randomized designs is based on urn models, that are classical tools to guarantee a randomized device Rosenberger (2002), Cheung et al. (2006). Asymptotic results concerning urn models with an irreducible reinforcement mean matrix could be found in Rosenberger (2002), Bai et al. (2002), Janson (2004), Bai and Hu (2005), Cheung et al. (2006). Recently, in Laruelle and Pags (2013) the randomized urn designs proposed in Bai et al. (2002), Bai and Hu (2005) have been studied by applying stochastic approximation algorithms, and asymptotic results have been obtained using these techniques. Moreover, a general class of immigrated urn models has been proposed in Cheung et al. (2011), which provides a unified view of various urn models with irreducible mean reinforcement matrix. However, all these urn processes are based on the assumption that the replacement matrix is irreducible, which is not satisfied by the Randomly Reinforced Urn (RRU) studied in May et al. (2005), Muliere et al. (2006), Paganoni and Secchi (2007), Flournoy and May (2009), which has a diagonal mean reinforcement matrix. The RRU models have been introduced in Durham and Yu (1990) for binary responses, applied to the dose-finding problems in Durham et al (1996), Durham et al. (1998) and then extended to the case of continuous responses in Beggs (2005), Muliere et al. (2006). An interesting result concerning RRU models states that the probability to allocate units to the superior treatment converges to one as the sample size increases. This property is very attractive from an ethical point of view. However, because of this asymptotic behavior, RRU models are not in the class of designs targeting a proportion in $(0,1)$, that usually is previously fixed or computed to optimize some suitable criteria. Hence, all the asymptotic properties concerning these procedures presented in literature [see for instance Melfi and Page (2000), Melfi et al. (2001)], are not straightforwardly fulfilled by the RRU designs.

So, in Aletti et al. (2013) the urn scheme of the RRU model has been conveniently modified, in order to construct a new urn model, called Modified Randomly Reinforced Urn (MRRU), that asymptotically targets a fixed allocation proportion in $(0,1)$, and at the same time reduces the number of subjects allocated to the inferior treatment. This goal has been realized by introducing two thresholds $\delta $ and $\eta , 0<\delta \le \eta <1$ for the urn proportion. These parameters modify the reinforcement’s process. A brief discussion on the MRRU design is reported at the end of Sect. 2. In general, $\delta $ represents the desired asymptotic proportion of subjects to allocate to $R$ when $W$ is the superior treatment, i.e. $m_R<m_W$, while $\eta $ will be the desired asymptotic proportion of subjects to allocate to $R$ when $R$ is the superior treatment, i.e. $m_R>m_W$. The asymptotic properties of MRRU studied in Aletti et al. (2013) and Ghiglietti and Paganoni (2014), together with results about adaptive estimators proved in Melfi et al. (2001), are crucial for the procedure presented in this paper.

In Sect. 2 we describe the framework we deal with, considering the case of Gaussian responses with known variances, and we discuss the selection of the parameters for the MRRU model. In Sect. 3 some assumptions on the distributions of the reinforcements required in Sect. 2 are relaxed: specifically, Gaussian responses with unknown variances and Exponential and Bernoulli responses are considered. Section 4 gathers some simulation studies and Sect. 5 contains the analysis of a real case study.

A short conclusion ends the paper (Sect. 6). Data analysis and simulations have been carried out using the statistical software R Development Core Team (2011).

2 The proportion-sample size space

Consider the classical hypothesis test for comparing the means of two Gaussian samples with known variances. Consider a classical procedure that assigns a proportion $p_0$ of patients to treatment $R, 1-p_0$ to treatment $W$, with $p_0\in (0,1)$. Let $n_0\in \mathbb {N}$ be the total number of subjects involved in the experiment. Let $n_0$ be the sample size that guarantees a minimum power ($\beta _0$) evaluated at a specific difference of the means ($\pm \Delta _0$). In what follows, $n_{0,R}:=n_0p_0$ and $n_{0,W}:=n_0(1-p_0)$ indicate the number of subjects assigned to treatments $R$ and $W$, respectively. Moreover,

responses to treatment $R$: $M_1,M_2,..,M_{n_{0,R}}$ i.i.d. $\sim \mathcal {N}(m_R,\sigma _R^2)$.
responses to treatment $W$: $N_1,N_2,..,N_{n_{0,W}}$ i.i.d. $\sim \mathcal {N}(m_W,\sigma _W^2)$.

For the classical hypothesis test

$$\begin{aligned} H_0:\ m_R-m_W=0\ \ \ \ \ \ \ \ \ vs\ \ \ \ \ \ \ \ \ H_1:\ m_R-m_W\ne 0 \end{aligned}$$

(1)

the critical region of the Likelihood Ratio Test (LRT) ${\mathcal {T}}_0$ with level $\alpha $ is:

$$\begin{aligned} R_{\alpha }=\left\{ \left| \overline{M}_{n_{0,R}}-\overline{N}_{n_{0,W}}\right| > \sqrt{\frac{\sigma _R^2}{n_{0,R}} + \frac{\sigma _W^2}{n_{0,W}} } z_{\frac{\alpha }{2}} \right\} \end{aligned}$$

(2)

where $\overline{M}_{n_{0,R}}=\sum _{i=1}^{n_{0,R}}M_i/n_{0,R}$ and $\overline{N}_{n_{0,W}}=\sum _{i=1}^{n_{0,W}}N_i/n_{0,W}$ and $z_{\frac{\alpha }{2}}$ is the quantile of order $1-\alpha /2$ of a standard normal distribution. The power function of the test ${\mathcal {T}}_0$ is the following

$$\begin{aligned} \beta _{{\mathcal {T}}_0}(\Delta )=P\left( Z < -z_{\frac{\alpha }{2}} - \frac{\Delta }{\sqrt{ \frac{\sigma _R^2}{n_{0,R}} + \frac{\sigma _W^2}{n_{0,W}} }}\right) + P\left( Z > z_{\frac{\alpha }{2}} - \frac{\Delta }{\sqrt{ \frac{\sigma _R^2}{n_{0,R}} + \frac{\sigma _W^2}{n_{0,W}} }}\right) , \end{aligned}$$

where $\Delta =m_R-m_W$. The test ${\mathcal {T}}_0$ could be represented in the space $((0,1)\times \mathbb {N})$, that we call proportion-sample size space, by a pair $(p_0,n_0)$. Any other point $(p,n)$ in the same space represents a test ${\mathcal {T}}$ with sample size equal to $n$ and allocation proportion to treatment $R$ equal to $p$. The goal is to individuate regions of this space characterized by tests ${\mathcal {T}}$ performing better than ${\mathcal {T}}_0$, i.e.

(a)
${\mathcal {T}}$ has a power function $\beta _{{\mathcal {T}}}(\Delta )$ uniformly higher than the power function of ${\mathcal {T}}_0$, i.e. $\beta _{{\mathcal {T}}_0}(\Delta )$;
(b)
${\mathcal {T}}$ assigns fewer patients to the inferior treatment than ${\mathcal {T}}_0$.

To achieve condition (a) we impose the following constraint

$$\begin{aligned} \beta _{{\mathcal {T}}}(\Delta ) \ge \beta _{{\mathcal {T}}_0}(\Delta )\ \ \ \forall \Delta \in \mathbb {R} \ \Leftrightarrow \ \ \frac{\sigma _R^2}{n p} + \frac{\sigma _W^2}{n(1-p)} \le \frac{\sigma _R^2}{n_0p_0} + \frac{\sigma _W^2}{n_0(1-p_0)}. \end{aligned}$$

(3)

From (3) we compute the function $n_{\beta }$ that separates two regions in the proportion-sample size space

$$\begin{aligned} n_{\beta }(p)\ =\ \left( \frac{\rho ^2}{p} + \frac{(1-\rho )^2}{1-p}\right) \left( \frac{\rho ^2}{n_0p_0} + \frac{(1-\rho )^2}{n_0(1-p_0)}\right) ^{-1} \end{aligned}$$

(4)

where $\rho $ indicates the Neyman allocation proportion $\frac{\sigma _R}{\sigma _R+\sigma _W}$, that, for any fixed sample size, provides the test with highest power.

In Fig. 1, points above the curve $n_{\beta }$ (red line) indicate tests ${\mathcal {T}}$ with a power uniformly higher than ${\mathcal {T}}_0$. To satisfy condition (b) we distinguish two different cases, depending on which the superior treatment is:

if the superior treatment is $R$, we impose
$$\begin{aligned} n(1-p)<n_0(1-p_0)\ \ \Leftrightarrow \ \ p > 1-\frac{n_0}{n}(1-p_0); \end{aligned}$$
(5)
if the superior treatment is $W$, we impose
$$\begin{aligned} np<n_0p_0\ \ \Leftrightarrow \ \ p<\frac{n_0}{n}p_0. \end{aligned}$$
(6)

Both these constraints are depicted in blue in the proportion-sample size space. Below each of these lines, either (5) or (6) are verified. In conclusion, we divide the $\textit{proportion-sample size}$ space in three regions:

Region $A$:
$$\begin{aligned} A = \left\{ (x,y)\in (0,1)\times (0,\infty )\ :\ n_{\beta }(x)<y<\frac{p_0}{x}n_0\right\} \end{aligned}$$
tests ${\mathcal {T}}\in A$ having a power uniformly higher and assigning fewer patients to treatment $R$ than ${\mathcal {T}}_0$.
Region $B$:
$$\begin{aligned} B = \left\{ (x,y)\in (0,1)\times (0,\infty )\ :\ y>\max \left\{ \frac{p_0}{x};\frac{1-p_0}{1-x}\right\} \cdot n_0 \right\} \end{aligned}$$
tests ${\mathcal {T}}\in B$ having a power uniformly higher and assigning more patients to both treatments than ${\mathcal {T}}_0$.
Region $C$:
$$\begin{aligned} C\ =\ \left\{ (x,y)\in (0,1)\times (0,\infty )\ :\ n_{\beta }(x)<y<\frac{1-p_0}{1-x}n_0\ \right\} \end{aligned}$$
tests ${\mathcal {T}}\in C$ having a power uniformly higher and assigning fewer patients to treatment $W$ than ${\mathcal {T}}_0$.

Hence, a test ${\mathcal {T}}=(p,n)$ is considered better than ${\mathcal {T}}_0$ if $(p,n)\in A$ and the superior treatment is $W$, or if $(p,n)\in C$ and the superior treatment is $R$. Unfortunately, the experimenter doesn’t know which the superior treatment is before the trial is conducted. For this reason, it is reasonable to set a response-adaptive design to construct the test ${\mathcal {T}}$. Let us introduce a vector $(X_1,X_2,...,X_n)\in \{0;1\}^n$ composed by the allocations to the treatments according to the adaptive design, i.e. $X_i=1$ if the subject $i$ receives treatment $R$ or $X_i=0$ if the subject $i$ receives treatment $W$. The quantities $N_R(n) = \sum _{i=1}^n X_i$ and $N_W(n) = \sum _{i=1}^n (1-X_i)$ are the number of patients allocated to treatments $R$ and $W$, respectively. Let us define the adaptive estimators based on responses collected at time $n$

$$\begin{aligned} \overline{M}(n)=\frac{\sum _{i=1}^n X_i M_i}{N_R(n)} \qquad \text {and} \qquad \overline{N}(n)=\frac{\sum _{i=1}^n (1-X_i) N_i}{N_W(n)}. \end{aligned}$$

(7)

Then, the test ${\mathcal {T}}$ is defined by the following critical region

$$\begin{aligned} R_{\alpha }^{adaptive }=\left\{ |\overline{M}(n)-\overline{N}(n)| > \sqrt{\frac{\sigma _R^2}{N_R(n)} + \frac{\sigma _W^2}{N_W(n)} } z_{\frac{\alpha }{2}} \right\} \end{aligned}$$

(8)

whose properties (in terms of power, level and asymptotic distribution of the test statistic) depend on the type of adaptive design has been adopted in the trial.

The authors propose to adopt as response-adaptive design the Modified Randomly Reinforced Urn design (MRRU) introduced in Aletti et al. (2013). In this model, an urn containing red and white ball is sequentially sampled and subjects are assigned to treatments corresponding to the colors of the sampled balls. After any allocation, the urn is virtually reinforced with a random real number of balls depending on the response given by the patient just assigned. We call $Z_n$ the proportion of red balls in the urn, which is also the probability of assigning the ($n+1$)-patient to treatment $R$. We reinforce the number of red (white) balls only if $Z_n<\eta $ ($Z_n>\delta $), with $0<\delta \le \eta <1$, fixed parameters. In Aletti et al. (2013) and Ghiglietti and Paganoni (2014) theoretical results concerning the MRRU model have been proved and the asymptotic behavior of the urn process has been discussed. In particular, when $m_R\ne m_W$ it has been proved that

$$\begin{aligned} \lim _{n\rightarrow \infty }Z_n\ =\ \lim _{n\rightarrow \infty }\frac{N_R(n)}{n}\ =\ \eta \mathbf {1}_{\{m_R>m_W\}}+\delta \mathbf {1}_{\{m_R<m_W\}}\ \ \ a.s. \end{aligned}$$

(9)

Moreover, from (9) both the sequences $N_R(n) = \sum _{i=1}^n X_i$ and $N_W(n) = \sum _{i=1}^n (1-X_i)$ diverge to infinity a.s. For this reason we can apply Proposition 3.1 of Aletti et al. (2013) concerning the adaptive estimators $\overline{M}(n)$ and $\overline{N}(n)$ defined in (7), which is a consequence of Theorem 2 of Melfi et al. (2001), i.e.

Proposition 1

The estimators $\overline{M}(n)$ and $\overline{N}(n)$ are consistent estimators of $m_R$ and $m_W$, respectively. Moreover as $n \rightarrow \infty $,

$$\begin{aligned} \left( \sqrt{N_R(n)}\frac{(\overline{M}(n)- m_R)}{\sigma _R},\sqrt{N_W(n)}\frac{(\overline{N}(n)- m_W)}{\sigma _W}\right) \rightarrow (\xi _1,\xi _2) \end{aligned}$$

in distribution, where $(\xi _1,\xi _2)$ are independent standard normal random variables.

This result gives us the asymptotic normality of the adaptive estimators $\overline{M}(n)$ and $\overline{N}(n)$. This result is very useful in an inferential setting, when a statistic based on the adaptive estimators is used. In particular, Proposition 1 provides the asymptotic normality of the test statistic, which justifies the term $z_{\frac{\alpha }{2}}$ in (8).

Let us fix a sample size $n$ higher than $n_0$ used in ${\mathcal {T}}_0$ (i.e., $n = c \cdot n_0$ with $c > 1$). For any $n> n_0$, we can identify the following intervals

$$\begin{aligned} I^{R_i}_n\ =\ \left\{ \ x \in (0,1)\ :\ (x,n) \in R_i\ \right\} ,\ \mathrm{with}\ R_i\in \{A,B,C\}. \end{aligned}$$

Observe that $(I^{R_i}_n)_i$ are pairwise disjoints and their union is a subset of $(0,1)$. We look for an adaptive test ${\mathcal {T}}$ represented in the proportion-sample size space by a point in region $A$ ($C$) when $R$ ($W$) is the inferior treatment. This goal is achieved when $\frac{N_R(n)}{n}\in I^{C}_n$ when $m_R>m_W$, and $\frac{N_R(n)}{n}\in I^{A}_n$ when $m_R<m_W$. Since (9) holds, we set $\delta \in I^A_n$ and $\eta \in I^C_n$. This implies that the test ${\mathcal {T}}=(p,n)$ is in the right region, i.e. where both condition (a) and (b) are satisfied. In Fig. 2 we show how the urn process $Z_n$ converges towards the right region.

Remark 1

It is worth observing that without loss of generality similar results can be proved in the case of an one-sided test instead of (1), for instance $H_0:m_R\le m_W$ and $H_1:m_R> m_W$. In this case, the goal (b) is achieved when we assign more patients to treatment $W$, so we can arbitrarily fix the parameter $\delta $ within the interval $(0,\eta )$.

3 Different response distributions

In this section we relax some assumptions on response distributions. First, we consider Gaussian response distributions with unknown variances, then, we discuss the case of non-Gaussian responses (exponential and Bernoulli).

When the variances are unknown, the regions $A-B-C$ can’t be defined a priori, since from (4) $n_{\beta }$ depend on $\rho =\frac{\sigma _R}{\sigma _R+\sigma _W}$. So, here we describe a convenient procedure to overcome this problem.

First, consider the adaptive estimators of the unknown variances $S_R^2(n)$ and $S_W^2(n)$, defined as follows

$$\begin{aligned} S_R^2(n)=\frac{\sum _{i=1}^n X_i (M_i-\overline{M}(n))^2}{N_R(n)-1},\quad \ \ \text {and}\ \ S_W^2(n)=\frac{\sum _{i=1}^n (1-X_i) (N_i-\overline{N}(n))^2}{N_W(n)-1}. \end{aligned}$$

Then, in (4) the true variances $\sigma _R^2$ and $\sigma _W^2$ with their adaptive estimators $S_R^2(i)$ and $S_W^2(i)$, so obtaining

$$\begin{aligned} n_{\beta }(p;\widehat{\rho }(i))\ :=\ \left( \frac{\widehat{\rho }^2(i)}{p} + \frac{(1-\widehat{\rho }(i))^2}{1-p}\right) \left( \frac{\widehat{\rho }^2(i)}{n_0p_0} + \frac{(1-\widehat{\rho }(i))^2}{n_0(1-p_0)}\right) ^{-1}, \end{aligned}$$

(10)

where $\widehat{\rho }(i)=\frac{S_R(i)}{S_R(i)+S_W(i)}$.

We note that $n_{\beta }(\cdot ;\widehat{\rho }(i))$ in (10) is a time dependent random function, since it depends on $\widehat{\rho }(i)$; at each step $i\le n$, a new response is collected, the adaptive estimators are updated and the function $n_{\beta }(\cdot ;\widehat{\rho }(i))$ changes. So, also the intervals $I^A_i, I^B_i, I^C_i$ will be random and they will change for any $i\le n$. This generates two sequences $(\delta _i)_i,(\eta _i)_i$ instead of two parameters $\delta ,\eta $, since we need to maintain the parameters of the urn model within the corresponding intervals: $\delta _i\in I^A_i$ and $\eta _i\in I^C_i$.

From Melfi and Page (2000) we have that the adaptive estimators $S_R^2(n)$ and $S_W^2(n)$ are strongly consistent, since the sequences $N_R(n)$ and $N_W(n)$ increase to infinity almost surely. Moreover, since $\widehat{\rho }(\cdot )$ and $n_{\beta }(p,\cdot )$ are continuous functions, the consistency of $S_R^2(n)$ and $S_W^2(n)$ implies that $n_{\beta }(p;\widehat{\rho }(i))\mathop {\longrightarrow }\limits ^{a.s.}n_{\beta }(p)$ for any $p\in (0,1)$. So, we have that $\delta _n\mathop {\longrightarrow }\limits ^{a.s.}\delta ,\eta _n\mathop {\longrightarrow }\limits ^{a.s.}\eta $ and $\delta \in I^A, \eta \in I^C$. This implies that $Z_n$ converge a.s. to $\delta $ when $m_R<m_W$ or to $\eta $ when $m_R>m_W$ [for further details see Ghiglietti (2014)].

When we relax the normality assumption on the reinforcements distributions it is not easy to write the power function of the test in an analytic form, by solving the condition $\beta _{{\mathcal {T}}}(\Delta )\ge \beta _{{\mathcal {T}}_0}(\Delta )$ and then by computing the function $n_{\beta }$. Anyway, this task can be numerically found; so we will show that the $\textit{proportion-sample size}$ space can be partitioned again in the regions $A-B-C$ even with non-Gaussian reinforcements.

Exponential responses:

Let us make the following assumptions on the responses

responses to treatment $R$: $M_1,M_2,..,M_{n_{0,R}}$ i.i.d. $\sim \mathcal {E}({\uplambda }_R)$.
responses to treatment $W$: $N_1,N_2,..,N_{n_{0,W}}$ i.i.d. $\sim \mathcal {E}({\uplambda }_W)$.

Our aim is to perform the following hypothesis test

$$\begin{aligned} H_0:\ {\uplambda }_R={\uplambda }_W\ \ \ \ \ \ \ \ \ vs\ \ \ \ \ \ \ \ \ H_1:\ {\uplambda }_R\ne {\uplambda }_W. \end{aligned}$$

(11)

The likelihood function of the whole sample is

$$\begin{aligned} L({\uplambda }_R,{\uplambda }_W,data)= & {} {\uplambda }_R^{n_{0,R}}{\uplambda }_W^{n_{0,W}}\exp \left( -{\uplambda }_R\sum _{i=1}^{n_{0,R}}M_i -{\uplambda }_W\sum _{i=1}^{n_{0,W}}N_i\right) \\= & {} \left( \ {\uplambda }_R^{p_0}{\uplambda }_W^{1-p_0}\exp \left( -{\uplambda }_R\overline{M}_{n_{0,R}}p_0 -{\uplambda }_W\overline{N}_{n_{0,W}}(1-p_0)\right) \ \right) ^n \end{aligned}$$

where $\overline{M}_{n_{0,R}}=\sum _{i=1}^{n_{0,R}}M_i/n_{0,R}$ and $\overline{N}_{n_{0,W}}=\sum _{i=1}^{n_{0,W}}N_i/n_{0,W}$. Then, the likelihood ratio test (see Lehmann and Romano 2005) gives us the following critical region

$$\begin{aligned}&\left\{ \ \frac{\sup _{{\uplambda }_R={\uplambda }_W\in (0,\infty )}L({\uplambda }_R,{\uplambda }_W,data)}{\sup _{({\uplambda }_R,{\uplambda }_W)\in (0,\infty )^2}L({\uplambda }_R,{\uplambda }_W,data)} \ <\ c_{\alpha }\ \right\} \\&\quad = \left\{ \ \frac{\overline{M}_{n_{0,R}}^{p_0}\ \cdot \ \overline{N}_{n_{0,W}}^{1-p_0}}{\overline{M}_{n_{0,R}}\cdot p_0+\overline{N}_{n_{0,W}}\cdot (1-p_0)} \ <\ \root n \of {c_{\alpha }}\ \right\} \end{aligned}$$

where $c_{\alpha }\in (0,1)$ can be determined to set the level of this critical region equal to $\alpha $.

Bernoulli responses:

Let us make the following assumptions on patients’ responses

responses to treatment $R$: $M_1,M_2,..,M_{n_{0,R}}$ i.i.d. $\sim \mathcal {B}(p_R)$.
responses to treatment $W$: $N_1,N_2,..,N_{n_{0,W}}$ i.i.d. $\sim \mathcal {B}(p_W)$.

Let us consider now the following hypothesis test

$$\begin{aligned} H_0:\ p_R=p_W\ \ \ \ \ \ \ \ \ vs\ \ \ \ \ \ \ \ \ H_1:\ p_R\ne p_W. \end{aligned}$$

(12)

The likelihood function for two samples of Bernoulli variables is

$$\begin{aligned} \begin{aligned}&L(p_R,p_W,data)\\&\quad =\left( p_R^{\overline{M}_{n_{0,R}}p_0}(1-p_R)^{(1-\overline{M}_{n_{0,R}})p_0}p_W^{\overline{N}_{n_{0,W}}(1-p_0)} (1-p_W)^{(1-\overline{N}_{n_{0,W}})(1-p_0)}\right) ^n \end{aligned} \end{aligned}$$

Then, the likelihood ratio test, see Lehmann and Romano (2005), gives us the following critical region

$$\begin{aligned} \begin{aligned}&\left\{ \ \frac{\sup _{p_R=p_W\in (0,1)}L(p_R,p_W,data)}{\sup _{(p_R,p_W)\in (0,1)^2}L(p_R,p_W,data)} \ <\ c_{\alpha }\ \right\} \\&=\left\{ \ \frac{\overline{P}^{\overline{P}}(1-\overline{P})^{1-\overline{P}}}{\overline{M}_{n_{0,R}}^{\overline{M}_{n_{0,R}}p_0}(1-\overline{M}_{n_{0,R}})^{(1-\overline{M}_{n_{0,R}})p_0} \overline{N}_{n_{0,W}}^{\overline{N}_{n_{0,W}}(1-p_0)}(1-\overline{N}_{n_{0,W}})^{(1-\overline{N}_{n_{0,W}})(1-p_0)}} \ <\ \root n \of {c_{\alpha }}\ \right\} \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \overline{P}=\frac{\sum _{i=1}^{n_{0,R}}M_i+\sum _{i=1}^{n_{0,W}}N_i}{n}=\overline{M}_{n_{0,R}}p_0+\overline{N}_{n_{0,W}}(1-p_0). \end{aligned}$$

Also in this case, $c_{\alpha }\in (0,1)$ can be determined to set the level of this critical region equal to $\alpha $.

The power function ($\widehat{\beta }_{(p_0,n_0)}$) in both cases (11) and (12) can be numerically computed. For any $p\in (0,1)$, we define

$$\begin{aligned} n_{\beta }(p)\ :=\ \min \left\{ \ n\ge 1\ :\ \widehat{\beta }_{(p,n)}\ge \widehat{\beta }_{(p_0,n_0)}\ \right\} \end{aligned}$$

Once we have computed the function $n_{\beta }(\cdot )$, we partition the $\textit{proportion-sample size}$ space, we introduce the intervals $I_n^C$ and $I_n^A$ and we fix the parameters $\eta $ and $\delta $. As we can see from Fig. 3, the shape of the regions is the same of those computed in the case of Gaussian responses.

4 Simulation studies

In this section we show some simulation studies that aim at illustrating the theory presented in the previous sections of the paper. Let us consider the two-sided hypothesis test (1), for comparing the mean effect of two treatments $R$ and $W$. We simulated Gaussian responses to treatments $R$ and $W$ with parameters:

$m_W=10$,
$m_R\in \{5,7,9,9.5,10.5,11,13,15\}$,
equal variances: $\sigma _R^2=\sigma _W^2=1.5^2$,
different variances: $\sigma _R^2=1,\sigma _W^2=2^2$.

The test ${\mathcal {T}}_0$ is computed by setting the following parameters: $\alpha =0.05, \beta _0=0.9, \Delta _0=1, p_0=0.5$. Then, the sample size for ${\mathcal {T}}_0$ can been computed and it is $n_0=96$ when the variances are equal and $n_0=106$ when the variances are different.

At this point, we apply the procedure described in Sect. 2 to get an adaptive test ${\mathcal {T}}$ based on MRRU design performing better than ${\mathcal {T}}_0$. The sample size of ${\mathcal {T}}$ has been increased by 25 % ($n=1.25\cdot n_0$), obtaining $n=120$ in the case of equal variances and $n=132$ in the case of different variances. In both cases, we can design the regions $A, B$ and $C$ and the corresponding intervals $I^A_n, I^B_n$ and $I^C_n$

$\sigma _R^2=1.5^2, \sigma _W^2=1.5^2\ \ \Rightarrow $ $I^A_{120}=(0.127,0.402)$, $I^C_{120}=(0.598,0.632)$.
$\sigma _R^2=1, \sigma _W^2=4\ \ \ \ \ \ \ \ \Rightarrow $ $I^A_{132}=(0.279,0.403)$, $I^C_{132}=(0.597,0.721)$

In all simulations, the urn has been initialized with a total number of balls equal to $d_0=(m_R+m_W)/2$; the initial urn proportion $z_0$ has been set at the center of the interval $(\delta ,\eta )$. Then, for each value of $m_R \in \{5,7,9,9.5,10.5,11,13,15\}$, we have run $1000$ urn processes $(Z_k)_k$ stopped at time $n$.

In Table 1 (equal variances) and in Table 2 (different variances), we report the proportion of simulation runs in which the power of ${\mathcal {T}}$ is higher than the power of ${\mathcal {T}}_0$ (first column) and the proportion of replications in which ${\mathcal {T}}$ assigns fewer subjects than ${\mathcal {T}}_0$ to treatment $R$ and $W$ (second/third column). The parenthesis indicate the allocations to the superior treatment. In Fig. 4, we report the flanked boxplots of the number of subjects assigned to the inferior treatment in the 1000 replications of the urn design, for different values of $\Delta $.

Table 1 Proportion of simulation runs in which ${\mathcal {T}}$ performs better than ${\mathcal {T}}_0$ in terms of power (first column) and subjects assigned to the inferior treatment (second/third column).

Full size table

Table 2 Proportion of simulation runs in which ${\mathcal {T}}$ performs better than ${\mathcal {T}}_0$ in terms of power (first column) and subjects assigned to the inferior treatment (second/third column).

Full size table

It is interesting to investigate the procedure described in Sect. 2 when the test ${\mathcal {T}}_0$ adopts an allocation related with the treatment performances. Let us consider the Optimal Adaptive Design for Bernoulli responses (RSIHR) presented in Rosenberger et al. (2001). The allocation proportion of this model converges to $p_0=\frac{\sqrt{p_R}}{\sqrt{p_R}+\sqrt{p_W}}$ that is the allocation that minmizes the number of expected failures at fixed power $\beta _0$, where $p_R$ and $p_W$ are the success probabilities of $R$ and $W$, respectively. Let us fix

Significance level $\alpha =0.05$ and the power $\beta _0=0.9$
Success probabilities: $p_R=0.2, p_W=0.1$

Then, ${\mathcal {T}}_0$ should have an allocation proportion $p_0=\frac{\sqrt{p_R}}{\sqrt{p_R}+\sqrt{p_W}}=0.586$ and sample size $n_0=516$.

By following the procedure described in Sect. 2, we construct the test ${\mathcal {T}}$ with MRRU model with sample size $n=645, \eta =0.724$ and $\delta =0.402$. We realized 200 replications and the results are reported in Fig. 5. For both test ${\mathcal {T}}$ and ${\mathcal {T}}_0$, red boxplots indicate the power, blue boxplots represent the number of subjects assigned to the inferior treatment and green boxplots indicate the number of failures for $n=645$ subjects. Since ${\mathcal {T}}_0$ uses only $n_0=516$, we have considered the failures of the $n-n_0=129$ subjects as if they had been assigned to the superior treatment.

5 Real case study

In this section we show a real case study, also presented in Ghiglietti and Paganoni (2014), where the application of the methodology presented in the paper would have improved the performance of a classical test, from both the statistical and ethical point of view. We consider data concerning treatment times of patients affected by ST- Elevation Myocardial gathered in the MOMI$^2$ (MOnth MOnitoring Myocardial Infarction in MIlan) study, (see Grieco et al. 2012). The main rescue procedure for these patients is the Primary Angioplasty. It is well known that the time between the arrival at ER (called Door) and the time of intervention (called Baloon) must be reduced as much as possible in order to improve the outcome of patients and reduce the in-hospital mortality. So in this case the Door to Baloon time (DB) is the treatment response. We have two different treatments: the patients managed by the 118 (free-toll number for emergency in Italy) and the self presented ones. We design our experiment to allocate the majority of patients to treatment performing better, and simultaneously collect evidence in comparing the distributions of DB times.

Data are door-to-baloon times (DB) in minutes of 1179 patients. Among them, 657 subjects have been managed by 118, while the others 522 subjects reached the hospital by themselves. We identify the treatment $W$ with the choice of calling 118 and the treatment $R$ with choice of going to the hospital by themselves. Treatment responses are represented by DB times. Since lower are the responses (DB time) better is the treatment, without loss of generality we transform the responses through a monotonic decreasing function. The true means and variances of populations $R$ and $W$ have been computed using all data, obtaining: $m_R=1.503, m_W=1.996, \sigma _R=0.518, \sigma _W=0.760$. The true difference of the means $\Delta =m_R-m_W=-0.493$ is negative, so $W$ is the superior treatment in this case.

Initially, we consider a test ${\mathcal {T}}_0$ to compare the mean effects to treatments $R$ and $W$. Let us fix $\alpha =0.01, \beta _0=0.95, \Delta _0=0.5$. The allocation proportion is empirically set equal to $p_0=0.468$. Response distributions are verified to be Gaussian. Then, for a two-sided t-test we need a total of $n_0=119$ subjects, $n_0p_0=56$ allocated to treatment $R$ and $n_0(1-p_0)=63$ allocated to treatment $W$. The power of test ${\mathcal {T}}_0$ evaluated $\Delta $ is $\beta _{{\mathcal {T}}_0}(\Delta )=0.945$.

Now, consider the MRRU model to construct the adaptive test ${\mathcal {T}}$. ${\mathcal {T}}$ involves more subject in the experiment than ${\mathcal {T}}_0$, in particular $n$ is computed as $1.25\times n_0=148$. Nevertheless, since in practice variances are unknown, $n_0$ and $n$ are computed from variance estimators. As a consequence, the sample size of ${\mathcal {T}}$ is random and each replication of ${\mathcal {T}}$ has a different value of $n$.

We realized 500 simulation runs of the urn procedure. Each replication uses a subset of responses selected by permutation from the whole dataset. In Fig. 6, ten replications of the urn proportion process $(Z_n)_n$ are represented.

As we can see from Fig. 6, the urn process seems to target region $A$, where parameter $\delta $ is set. Then, test ${\mathcal {T}}$ has higher power and assigns to treatment $R$ fewer patients than ${\mathcal {T}}_0$. This is our goal, since $R$ is the inferior treatment in this case ($m_R<m_W$).

For each one of 500 replications we compute analytically the power evaluated at $\Delta $. In Fig. 7 we show a boxplot with the power of the 500 replications of the urn model, to be compared with the power of ${\mathcal {T}}_0$. Moreover, we show for each simulation the proportion of subjects assigned to treatment $R$, to be compared with the proportion of subjects assigned to treatment $R$ by ${\mathcal {T}}_0$.

From Fig. 7, we note that the MRRU design constructs a test ${\mathcal {T}}$ with power higher than ${\mathcal {T}}_0$. This occurs in more than 99 % of replications, and the average power over the replications is

$$\begin{aligned} \frac{1}{500}\sum _{i=1}^{500} \beta _{{\mathcal {T}}i}(\Delta )\ =\ 0.975\ >\ 0.945\ =\ \beta _{{\mathcal {T}}_0}(\Delta ). \end{aligned}$$

Even if ${\mathcal {T}}$ uses a sample size $n$ larger than ${\mathcal {T}}_0$, in the 52.6 % of the runs the number of subjects allocated to the inferior treatment $R$ by ${\mathcal {T}}$ is less that by ${\mathcal {T}}_0$. Besides, the average number of units assigned to treatment $R$ is almost the same of the number computed with ${\mathcal {T}}_0$

$$\begin{aligned} \frac{1}{500}\sum _{i=1}^{500} N_{Ri}\ =\ 56.43\ \simeq \ 56\ =\ n_0\cdot p_0. \end{aligned}$$

6 Conclusions

In this paper we conduct an analysis on the statistical properties of tests that compares the means of the responses to two treatments. Given a test ${\mathcal {T}}_0$, we point out which features a response-adaptive test ${\mathcal {T}}$ should have in order to perform better than ${\mathcal {T}}_0$. In a clinical trials framework, this goal is achieved when ${\mathcal {T}}$ has (a) higher power and (b) assigns to the inferior treatment fewer subjects than ${\mathcal {T}}_0$. Specifically, we individuate in the proportion-sample size space the subregions where selecting the allocation proportion $p$ and the sample size $n$ of the test ${\mathcal {T}}$.

The test ${\mathcal {T}}$ can be implemented by using a response-adaptive design. We propose an urn procedure (MRRU) that is able to target a fixed proportion allocation in (0, 1). This urn model identifies the test ${\mathcal {T}}$ in a specific region, depending on the inferior treatment, and both the goals (a) and (b) are accomplished. We show that the assumption of Gaussian responses and known variances can be relaxed. We report some simulations and a case study that highlight the goodness of the procedure.

References

Aletti G, Ghiglietti A, Paganoni AM (2013) Randomly reinforced urn designs with prespecified allocations. J Appl Probab 50(2):486–498
Article MathSciNet MATH Google Scholar
Atkinson F, Biswas A (2014) Randomised response-adaptive designs in clinical trials. CRC Press, Boca-Raton
MATH Google Scholar
Bai ZD, Hu F, Zhang LX (2002) Gaussian approximation theorems for urn models and their applications. Ann Appl Probab 12:1149–1173
Article MathSciNet MATH Google Scholar
Bai ZD, Hu F (2005) Asymptotics in randomized urn models. Ann Appl Probab 15:914–940
Article MathSciNet MATH Google Scholar
Beggs AW (2005) On the convergence of reinforcement learning. J Econ Theory 122:1–36
Article MathSciNet MATH Google Scholar
Cheung ZD, Hu F, Zhang LX (2006) Asymptotic theorems of sequential estimation-adjusted urn models. Ann Appl Probab 16:340–369
Article MathSciNet MATH Google Scholar
Cheung ZD, Chan WS, Hu F, Zhang LX (2011) Immigrated urn models—theoretical properties and applications. Ann Stat 39(1):643–671
Article MathSciNet MATH Google Scholar
Durham SC, Yu KF (1990) Randomized play-the leader rules for sequential sampling from two populations. Probab Eng Inf Sci 26(4):355–367
Article MATH Google Scholar
Durham SC, Flournoy N, Li W (1996) Randomized plya urn designs. In: Proceedings Biometric Section of the American Statistical Association. Am Stat Assoc, Alexandria, VA, pp 166–170
Durham SC, Flournoy N, Li W (1998) A sequential design for maximizing the probability of a response. Can J Stat 26(3):479–495
Article MathSciNet MATH Google Scholar
Flournoy N, May C (2009) Asymptotics in response-adaptive designs generated by a two-color, randomly reinforced urn. Ann Stat 37:1058–1078
Article MathSciNet MATH Google Scholar
Flournoy N, May C, Secchi P (2012) Asymptotically optimal response-adaptive designs for allocating the best treatment: an overview. Int Stat Rev 80(2):293–305
Article MathSciNet Google Scholar
Ghiglietti A, Paganoni AM (2014) Statistical properties of two-color randomly reinforced urn design targeting fixed allocations. Electron J Stat 8(1):708-170
Article MathSciNet MATH Google Scholar
Ghiglietti A (2014) Statistical properties of urn models in response-adaptive designs. PhD thesis. Politecnico di Milano
Grieco N, Ieva F, Paganoni AM (2012) Performance assessment using mixed effects models: a case study on coronary patient care. IMA J Manag Math 23:117–131
Article MATH Google Scholar
Hu F, Rosenberger WF (2006) The theory of response-adaptive randomization in clinical trials. Wiley, Hoboken, NJ
Book MATH Google Scholar
Janson S (2004) Functional limit theorems for multitype branching processes and generalized pólya urns. Stoch Process Appl 110:177–245
Article MathSciNet MATH Google Scholar
Lachin JM, Rosenberger WF (2002) Randomization in clinical trials. Wiley, New York
MATH Google Scholar
Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer, New York
MATH Google Scholar
Laruelle S, Pags G (2013) Randomized urn models revisited using stochastic approximation. Ann Appl Probab 23(4):1409–1436
Article MathSciNet MATH Google Scholar
May C, Paganoni AM, Secchi P (2005) On a two-color generalized Polya urn. Metron, vol. LXIII. p 1
Melfi F, Page C (2000) Estimation after adaptive allocation. J Stat Plan Inference 87:353–363
Article MathSciNet MATH Google Scholar
Melfi F, Page C, Geraldes M (2001) An adaptive randomized design with application to estimation. Can J Stat 29:107–116
Article MathSciNet MATH Google Scholar
Muliere P, Paganoni AM, Secchi P (2006) A randomly reinforced urn. J Stat Plan Inference 136:1853–1874
Article MathSciNet MATH Google Scholar
Paganoni AM, Secchi P (2007) A numerical study for comparing two response-adaptive designs for continuous treatment effects. Stat Methods Appl 16:321–346
Article MathSciNet MATH Google Scholar
R Development Core Team (2011). A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/
Rosenberger WF, Stallard N, Ivanova A, Harper CN, Ricks ML (2001) Optimal adaptive designs for binary response trials. Biometrics 57:909–913
Article MathSciNet MATH Google Scholar
Rosenberger WF (2002) Randomized urn models and sequential design. Seq Anal 21(1&2):1–28
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universitá degli Studi di Milano, Milan, Italy
Andrea Ghiglietti
Politecnico di Milano, Milan, Italy
Anna Maria Paganoni

Authors

Andrea Ghiglietti
View author publications
You can also search for this author in PubMed Google Scholar
Anna Maria Paganoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Ghiglietti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghiglietti, A., Paganoni, A.M. An urn model to construct an efficient test procedure for response adaptive designs. Stat Methods Appl 25, 211–226 (2016). https://doi.org/10.1007/s10260-015-0314-y

Download citation

Accepted: 23 April 2015
Published: 21 May 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10260-015-0314-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An urn model to construct an efficient test procedure for response adaptive designs

Abstract

Similar content being viewed by others

Asymptotic Properties of an Adaptive Randomly Reinforced Urn Model

Fixed-width confidence interval for covariate-adjusted response-adaptive designs

Hypothesis testing in adaptively sampled data: ART to maximize power beyond iid sampling

1 Introduction