Statistical Properties of Exclusive and Non-exclusive Online Randomized Experiments Using Bucket Reuse

Schultzberg, Mårten; Kjellin, Oskar; Rydberg, Johan

doi:10.1007/978-3-030-89906-6_50

Mårten Schultzberg¹⁰,
Oskar Kjellin¹⁰ &
Johan Rydberg¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 358))

Included in the following conference series:

Proceedings of the Future Technologies Conference

1210 Accesses

Abstract

Randomized experiments is a key part of product development in the tech industry. It is often necessary to run programs of exclusive experiments, i.e., groups of experiments that cannot be run on the same units during the same time. These programs imply restrictions on the random sampling, as units that are currently in an experiment cannot be sampled into a new one. Moreover, to technically enable this type of coordination with large populations, the units in the population are often grouped into ‘buckets’ and sampling is then performed on the bucket level. This paper investigates statistical implications of both the restricted sampling and the bucket-level sampling. The contribution of this paper is threefold: First, bucket sampling is connected to the existing literature on randomized experiments in complex sampling designs which enables establishing properties of the difference-in-means estimator of the average treatment effect. These properties are needed for inference to the population under random sampling of buckets. Second, the bias introduced by restricting the sampling as imposed by programs of exclusive experiments, is derived. Finally, practical recommendations on how to empirically evaluate and handle this bias is discussed together with simulations that support the theoretical findings .

M. Schultzberg—The authors thanks Andreas Born, Claire Detilleux, Brian St Thomas, Michael Stein, and the colleagues in the experimentation platform team for helpful feedback and suggestions for this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimal replicates for designed experiments under the online framework

Article 22 February 2019

Acceptance sampling for attributes via hypothesis testing and the hypergeometric distribution

Article Open access 07 October 2017

Randomized Block Designs: Interval Data

Notes

1.
P-hacking still exists though, since it is easier to put a feature into production if there is a ‘significant’ experiment results to back it up.
2.
Assuming even splits into treatment and control which is often selected for efficiency reasons.
3.
Assuming a program with many experiments over time.
4.
The peaks are artefacts form that the $\delta $’s are multiplicatives of the lengths of the experiments.

References

Amrhein, V., Greenland, S., McShane, B.: Scientists rise up against statistical significance. Nature 567(7748), 305–307 (2019)
Article Google Scholar
Dai, B., Ding, S., Wahba, G.: Multivariate Bernoulli distribution. Bernoulli 19(4), 1465–1483 (2013)
Article MathSciNet Google Scholar
Dawid, A.P.: Conditional independence in statistical theory. J. R. Stat. Soc. Ser. B (Methodol.) 41(1), 1–31 (1979)
MathSciNet MATH Google Scholar
Fisher, R.A.: The Design of Experiments. Oliver and Boyd, Edinburgh (1935)
Google Scholar
Hern, A.: Why Google has 200m reasons to put engineers over designers, February 2014
Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)
Article MathSciNet Google Scholar
Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2(8), e124–e124 (2005)
Article Google Scholar
Johansson, P., Schultzberg, M.: Rerandomization strategies for balancing covariates using pre-experimental longitudinal data. J. Comput. Graph. Stat. 29(4), 798–813 (2020)
Article MathSciNet Google Scholar
Kish, L., Frankel, M.R.: Inference from complex samples. J. R. Stat. Soc. Ser. B (Methodol.) 36(1), 1–22 (1974)
MathSciNet MATH Google Scholar
Knuth, D.E.: The Art of Computer Programming: Volume 3: Sorting and Searching. Pearson Education, London (1998)
Google Scholar
Kohavi, R., Thomke, S.: The surprising power of online experiments. Harvard Bus. Rev. 95, 74–82 (2017)
Google Scholar
Pradhan, B.K.: On efficiency of cluster sampling on sampling on two occasions. Statistica 64(1), 183–191 (2007)
MathSciNet MATH Google Scholar
Lohr, S.L.: Sampling: Design and Analysis. Chapman & Hall/CRC Texts in Statistical Science. CRC Press (2019)
Google Scholar
Neyman, J.: On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J. R. Stat. Soc. 97(4), 558 (1934)
Article Google Scholar
Neyman, J.: On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat. Sci. (1990), 5(4), 465–472 (1923)
Google Scholar
Nordin, M., Schultzberg, M.: Properties of restricted randomization with implications for experimental design. arXiv preprint arXiv:2006.14888 (2020)
Rubin, D.B.: Inference using potential outcomes : design, modeling, decisions. J. Am. Stat. Assoc. 100(469), 322–331 (2005)
Article MathSciNet Google Scholar
Student (William Sealy Gosset): The probable error of a mean. Biometrika, 6(1), 1–25 (1908)
Google Scholar
Sukhatme, P.V., Sukhatme, S.: Sampling Theory of Surveys with Applications. Iowa State University Press, Iowa City (1984)
MATH Google Scholar
Tang, D., Agarwal, A., O’Brien, D., Meyer, M.: Overlapping experiment infrastructure: more, better, faster experimentation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 17–26 (2010)
Google Scholar
van den Brakel, J., Renssen, R.: Design and analysis of experiments embedded in sample surveys. J. Off. Stat. 14(3), 277–295 (1998)
Google Scholar
van den Brakel, J., Renssen, R.: Analysis of experiments embedded in complex sampling designs. Surv. Methodol. 31(1), 23–40 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Experimentation Platform Team – Spotify, Stockholm, Sweden
Mårten Schultzberg, Oskar Kjellin & Johan Rydberg

Authors

Mårten Schultzberg
View author publications
You can also search for this author in PubMed Google Scholar
Oskar Kjellin
View author publications
You can also search for this author in PubMed Google Scholar
Johan Rydberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mårten Schultzberg .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Appendices

Appendix

All theorems are repeated here for convenience.

1.1 6.1 Complete Random Sampling of Buckets

Theorem 4

Under random sampling of equally sized buckets, with random treatment assignment into two equally sized groups, the sample difference-in-means estimator is an unbiased estimator of the ATE. i.e.

$$\begin{aligned} E[\widehat{ATE}] = ATE, \end{aligned}$$

where expectation is taken over the design space of random samples and treatment allocations.

Proof

Under random sampling of equally sized buckets, it follows that $\pi _i=\frac{N_S}{N}\,\,\forall \,\, i=1,...,N$, which implies that the Horvitz-Thompson estimator simplifies as

$$\begin{aligned} \hat{\bar{Y}}_{\pi _w}^w=\frac{N_S}{N \frac{N_S}{2}} \frac{N}{N_S} \sum _{i\in S_s} Y_i(W=w)=\frac{1}{\frac{N_S}{2}}\sum _{i\in S_s} Y_i(W=w)=\bar{Y}^w, \end{aligned}$$

(8)

which is simply the sample means of the groups. Unbiasedness follows directly from the results in [6], but we give an alternative proof here to build intuition. We drop the superscript t on the ATE to easy notation. Denote the difference-in-means estimator of the average treatment effect

$$\begin{aligned} \widehat{ATE}= \frac{1}{N_s/2} \left( \sum _{i:i\in \mathbf {S}\cap W_i=1} Y_i(W_i) -\sum _{i:i\in \mathbf {S}\cap W_i=0} Y_i(W_i) \right) . \end{aligned}$$

(9)

Enumerate all possible sample under random sampling of buckets with $\mathbf {S}_s$ where $s=1,...,\text {card}(\mathcal {S}_B)$. Moreover, enumerate all possible treatment assignments over random treatment assignments by $W^j$ where $j=1,...,\text {card}(\mathcal {W})$. This implies that we can write one single estimate as

$$\begin{aligned} \widehat{ATE}_{s,j}= \frac{1}{N_s/2} \left( \sum _{i:i\in \mathbf {S}_s\cap W^j_i=1} Y_i(W_i^j) -\sum _{i:i\in \mathbf {S}_s\cap W^j_i=0} Y_i(W_i^j) \right) . \end{aligned}$$

(10)

The expected value of $\widehat{ATE}_B$, where the subscript B indicates random sampling of buckets from $\mathcal {B}$, is given by

$$\begin{aligned} E[\widehat{ATE}_B]&= \frac{1}{\text {card}(\mathcal {S}_B)} \sum _{s=1}^{\text {card}(\mathcal {S}_B)}\left( \frac{1}{\text {card}(\mathcal {W})} \sum _{j=1}^{\text {card}(\mathcal {W})} \widehat{ATE}_{s,j} \right) \\&= \frac{1}{\text {card}(\mathcal {S}_B)} \frac{1}{\text {card}(\mathcal {W})} \sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \widehat{ATE}_{s,j} \\&= \frac{1}{\text {card}(\mathcal {S}_B)} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \left( {\sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=1} Y_i(W_i^j)} \right. \\&\left. -\,\sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=0} Y_i(W_i^j) \right) , \end{aligned}$$

where the last step follows from equally sized buckets. Due to the symmetry of the random sampling of buckets and the equal bucket size, each bucket (and thereby unit) will be in equally many samples. Moreover, in each sample, each unit is in the treatment and control groups equally many times, respectively, due to the mirror property of randomization distributions [8, 16]. This implies that we are simply adding and subtracting the value of each unit several times. It follows that

$$\begin{aligned} E[\widehat{ATE}_B]&= \frac{1}{\text {card}(\mathcal {S}_B)} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \left( \sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=1} Y_i(W_i^j) \right. \\&\left. -\sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=0} Y_i(W_i^j) \right) \\&= \frac{1}{\text {card}(\mathcal {S}_B)} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \left( \xi _B \sum _{i=1}^N Y_i(1) -\xi _B \sum _{i=1}^N Y_i(0) \right) . \end{aligned}$$

The number of time that each unit will be in the control and the treatment group across all possible samples and treatment assignments is given by

$$\begin{aligned} \xi _B&= \frac{\text {card}(\mathcal {W})}{2} \times \left( {\begin{array}{c}B-1\\ \frac{N_S}{N_B}-1\end{array}}\right) , \end{aligned}$$

which implies

$$\begin{aligned} E[\widehat{ATE}_B]&= \frac{1}{\text {card}(\mathcal {S}_B)} \frac{\left( {\begin{array}{c}K-1\\ \frac{N_S}{N_B}-1\end{array}}\right) }{N_S} \left( \sum _{i=1}^N Y_i(1) -\sum _{i=1}^N Y_i(0) \right) . \end{aligned}$$

We note that

$$\begin{aligned} \text {card}(\mathcal {S}_B) = \frac{B}{\frac{N_S}{N_B}} \left( {\begin{array}{c}B-1\\ \frac{N_S}{N_B}-1\end{array}}\right) \end{aligned}$$

which gives the final expression

$$\begin{aligned} E[\widehat{ATE}_B]&= \frac{N_S}{N_B B\left( {\begin{array}{c}K-1\\ \frac{N_S}{N_B}-1\end{array}}\right) } \frac{\left( {\begin{array}{c}K-1\\ \frac{N_S}{N_B}-1\end{array}}\right) }{N_S} \left( \sum _{i=1}^N Y_i(1) -\sum _{i=1}^N Y_i(0) \right) \\&= \frac{1}{N_B B} \left( \sum _{i=1}^N Y_i(1) -\sum _{i=1}^N Y_i(0) \right) \\&= \frac{1}{N} \left( \sum _{i=1}^N Y_i(1) -\sum _{i=1}^N Y_i(0) \right) =ATE. \end{aligned}$$

Which is in line with [6] as expected. $\blacksquare $

1.2 Restricted Random Sampling of Buckets

Lemma 2

For a $\delta _{\,\,\perp \!\!\! \perp }$ fulfilling Condition 1, the the difference in means estimator $\widehat{ATE^{\tilde{\mathcal {B}}_t}_{t-\delta _{\,\,\perp \!\!\! \perp }}}$ is an unbiased estimator of $ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}$, i.e.,

$$\begin{aligned} E[\widehat{ATE^{\tilde{\mathcal {B}}_t}_{t-\delta _{\,\,\perp \!\!\! \perp }}}] = ATE_{t-\delta _{\,\,\perp \!\!\! \perp }} \end{aligned}$$

(11)

Proof

Since a program of exclusive experiment is expected to change user behaviour and therefore reactions to future changes, it generally holds that

(12)

However, it is also the case that the only dependency between the potential outcomes and the set of available buckets is captured by the history of the available buckets. The samples are random from the set of available buckets – if different subsets have different experiences that affects their behaviour such that the ATE changes, the dependency between the sample and these changes are completely described by the history of $\tilde{\mathcal {B}}_t$. It follows from Condition 1 that

$$\begin{aligned} \tilde{\mathcal {B}}_t \,\,\perp \!\!\! \perp \tilde{\mathcal {B}}_{t-\delta }, \tilde{\mathcal {B}}_{t-\delta -1},...,\tilde{\mathcal {B}}_{1} \Rightarrow Y_{t-\delta _{\,\,\perp \!\!\! \perp }}(0), Y_{t-\delta _{\,\,\perp \!\!\! \perp }}(1) \,\,\perp \!\!\! \perp \tilde{\mathcal {B}}_t. \end{aligned}$$

(13)

In other words, in relation to the potential outcomes at time $t-\delta _{\,\,\perp \!\!\! \perp }$, the subset $\tilde{\mathcal {B}}_t$ is a random subset from $\mathcal {B}$. One way to understand this is that from a randomization perspective it is equivalent to 1. randomly selecting a subset at time t, randomly sample from that subset, and finally randomly assign the treatment, and, 2. semi-randomly selecting subsets in steps until the subset is independent from the starting set (at step $\delta _{\,\,\perp \!\!\! \perp }$), and then sample and assign treatment randomly. The semi-random subsetting in $\delta _{\,\,\perp \!\!\! \perp }$ steps is essentially an ineffective method for randomly drawing a subset. The important practical difference between 1 and 2 is that in a program of exclusive experiments, the potential outcomes are expected to change as a function of the steps of subsetting in 2. For this reason, performing 1 at time t yields a subset that is independent of the outcome at time t and all other time periods, whereas 2 yields a subset that is independent in relation to the potential outcomes only at time $t-\delta _{\,\,\perp \!\!\! \perp }$ and before that.

Since $\tilde{\mathcal {B}}_t$ is a random subset of buckets in relation to the potential outcomes at time $t-\delta _{\,\,\perp \!\!\! \perp }$ under Condition 1, it only remains to prove that the difference-in-means estimator is an unbiased estimator of $ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}$ under random ‘sampling’ of a subset, random sampling within the subset and random treatment assignment within the sample to prove Lemma 1. Enumerate the possible subsets,$\tilde{\mathcal {B}}_t^b$, of a given size as $b=1,...,\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) $, where each subset is equally probable in relation to the potential outcomes at time $t-\delta _{\,\,\perp \!\!\! \perp }$. In each subset, enumerate all possible samples $\mathbf {S}_s$ as $s=1,...,\left( {\begin{array}{c}\text {card}(\tilde{\mathcal {B}}_t)\\ \frac{N_S}{N_B}\end{array}}\right) $. This implies that using results from the proof of Theorem 1

$$\begin{aligned} E[\widehat{ATE^{\tilde{\mathcal {B}}_t}_t}]&= \frac{1}{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \sum _{b=1}^{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \frac{1}{\text {card}(\mathcal {S}_{\tilde{\mathcal {B}}^b_t})} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \\ {}&\left( \sum _{s=1}^{\text {card}(\mathcal {S}_{\tilde{\mathcal {B}}^b_t})} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=1} Y_{i,t-\delta _{\,\,\perp \!\!\! \perp }}(W_i^j) \right. \\&\left. -\sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=0} Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(W_i^j) \right) \\&= \frac{1}{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \frac{1}{\text {card}(\mathcal {S}_{\tilde{\mathcal {B}}_t})} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \\&\left( \sum _{b=1}^{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \sum _{s=1}^{\text {card}(\mathcal {S}_{\tilde{\mathcal {B}}^b_t})} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=1} Y_{i,t-\delta _{\,\,\perp \!\!\! \perp }}(W_i^j) \right. \\&\left. - \sum _{b=1}^{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \sum _{s=1}^{\text {card}(\mathcal {S}_B)} \sum _{j=1}^{\text {card}(\mathcal {W})} \sum _{i:i\in \mathbf {S}_s\cap W^j_i=0} Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(W_i^j) \right) \end{aligned}$$

where, similar to the proof of Theorem 1, each unit is included in equally many subsets and samples, and within each sample each unit is included in the treatment group and the control group equally many times. It follows that

$$\begin{aligned} E[\widehat{ATE^{\tilde{\mathcal {B}}_t}_t}]&= \frac{1}{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \frac{1}{\text {card}(\mathcal {S}_{\tilde{\mathcal {B}}_t})} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \left( \tilde{\xi } \sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(1) -\tilde{\xi }\sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(0) \right) , \end{aligned}$$

where

$$\begin{aligned} \tilde{\xi } = \frac{\text {card}(\mathcal {W})}{2} \times \left( {\begin{array}{c}\text {card}(\tilde{\mathcal {B}}_t)-1\\ \frac{N_S}{N_B}-1\end{array}}\right) \times \left( {\begin{array}{c}B-1\\ \text {card}(\tilde{\mathcal {B}}_t)-1\end{array}}\right) . \end{aligned}$$

(14)

Note that

$$\begin{aligned} \text {card}(\mathcal {S}_{\tilde{\mathcal {B}}_t})&= \frac{\text {card}(\tilde{\mathcal {B}}_t)}{\frac{N_S}{N_B}} \left( {\begin{array}{c}\text {card}(\tilde{\mathcal {B}}_t)-1\\ \frac{N_S}{N_B}-1\end{array}}\right) \\ \left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right)&= \frac{B}{\text {card}(\tilde{\mathcal {B}}_t)}\left( {\begin{array}{c}B-1\\ \text {card}(\tilde{\mathcal {B}}_t)-1\end{array}}\right) . \end{aligned}$$

Putting this together it follows that

$$\begin{aligned} E[\widehat{ATE^{\tilde{\mathcal {B}}_t}_t}]&= \frac{1}{\left( {\begin{array}{c}B\\ \text {card}(\tilde{\mathcal {B}}_t)\end{array}}\right) } \frac{1}{\text {card}(\mathcal {S}_{\tilde{\mathcal {B}}_t})} \frac{1}{\text {card}(\mathcal {W})} \frac{2}{N_S} \left( \sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(1) -\sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(0) \right) \\&=\frac{1}{\frac{B}{\text {card}(\tilde{\mathcal {B}}_t)}} \frac{1}{\frac{\text {card}(\tilde{\mathcal {B}}_t)}{\frac{N_S}{N_B}} } \frac{1}{N_S} \left( \sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(1) -\sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(0) \right) \\&=\frac{1}{N_BB} \left( \sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(1) -\sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(0) \right) \\&=\frac{1}{N} \left( \sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(1) -\sum _{i=1}^N Y_{i, t-\delta _{\,\,\perp \!\!\! \perp } }(0) \right) =ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}\,\, \blacksquare \end{aligned}$$

Theorem 5

For a $\delta _{\,\,\perp \!\!\! \perp }$ fulfilling condition 1, the bias in the difference in means estimator caused by sampling from the restricted set of the population of buckets given by $\tilde{\mathcal {B}}$ is given by

$$\begin{aligned} ATE- E[\widehat{ATE^{\tilde{\mathcal {B}}_t}_t}]&= \widetilde{ATE}_{t-\delta _{\,\,\perp \!\!\! \perp }:t} - \widetilde{ATE}^{ \tilde{\mathcal {B}}_t}_{t-\delta _{\,\,\perp \!\!\! \perp }:t}. \end{aligned}$$

Proof

Note that from Definition 1 it follows that

$$\begin{aligned} ATE - E[\widehat{ATE^{\tilde{\mathcal {B}}}_t}]&= ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}+\widetilde{ATE}_{t-\delta _{\,\,\perp \!\!\! \perp }:t} - ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}^{ \tilde{\mathcal {B}}_t} + \widetilde{ATE}_{t-\delta _{\,\,\perp \!\!\! \perp }:t}^{ \tilde{\mathcal {B}}_t}. \end{aligned}$$

By Lemma 1 the difference-in-means estimator under random sampling of bucket from $\tilde{\mathcal {B}}_t$ is an unbiased estimator of $ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}$ which implies that $ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}^{ \tilde{\mathcal {B}}_t} = ATE_{t-\delta _{\,\,\perp \!\!\! \perp }} $ which directly gives

$$\begin{aligned} ATE - E[\widehat{ATE^{\tilde{\mathcal {B}}}_t}]&= ATE_{t-\delta _{\,\,\perp \!\!\! \perp }}+\widetilde{ATE}_{t-\delta _{\,\,\perp \!\!\! \perp }:t} - ATE_{t-\delta _{\,\,\perp \!\!\! \perp }} + \widetilde{ATE}_{t-\delta _{\,\,\perp \!\!\! \perp }:t}^{ \tilde{\mathcal {B}}_t}\\&= \widetilde{ATE}_{t-\delta _{\,\,\perp \!\!\! \perp }:t} - \widetilde{ATE}^{ \tilde{\mathcal {B}}_t}_{t-\delta _{\,\,\perp \!\!\! \perp }:t} \,\,\blacksquare \end{aligned}$$

Simulations

Here the simulation from Sect. 5.2 is repeated but with 100 buckets instead of 10000. All simulations can be replicated using the Julia code in the supplementary files. The parameters are repeated in Table 3 for convenience. Fig. 8 and 9 display the simulation results. The patterns are the same as in Sect. 5.2, but the biases are generally larger for the same settings when the number of buckets is smaller. This is expected, as the heterogeneity is ‘fixed’ between these settings, in the sense that the ATE’s for the first time points are drawn from the same normal distribution. This implies that there are going to be more and more buckets (when the number of buckets increase) with similar values, in turn implying that the heterogeneity between samples will decrease.

Table 3. Parameters of the Monte Carlo simulation. For parameters with lists of values, a value is randomly (uniformly) drawn when a new experiment is started.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schultzberg, M., Kjellin, O., Rydberg, J. (2022). Statistical Properties of Exclusive and Non-exclusive Online Randomized Experiments Using Bucket Reuse. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2021, Volume 1. FTC 2021. Lecture Notes in Networks and Systems, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-030-89906-6_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-89906-6_50
Published: 24 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89905-9
Online ISBN: 978-3-030-89906-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Statistical Properties of Exclusive and Non-exclusive Online Randomized Experiments Using Bucket Reuse

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal replicates for designed experiments under the online framework