Abstract
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2-armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Data Availability Statement
The results presented in this work were based solely on simulated data. The code used for generating the data and the subsequent analysis is available through the GitHub repository: https://github.com/james-helium/gittins_adaptive_trials.
References
Sverdlov, O., Rosenberger, W.F.: On recent advances in optimal allocation designs in clinical trials. J. Stat. Theory. Pract. 7(4), 753–773 (2013)
Kalish, L.A., Begg, C.B.: Treatment allocation methods in clinical trials: a review. Stat. Med. 4(2), 129–144 (1985)
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)
Hu, F., Rosenberger, W.F.: The theory of response-adaptive randomization in clinical trials. John Wiley & Sons (2006)
Williamson, S.F., Jacko, P., Villar, S.S., Jaki, T.: A Bayesian adaptive design for clinical trials in rare diseases. Comp. Stat. Data. Anal. 113, 136–153 (2017)
Villar, S.S., Bowden, J., Wason, J.: Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Stat. Sci. 30(2), 199–215 (2015)
Williamson, S.F., Villar, S.S.: A response-adaptive randomization procedure for multi-armed clinical trials with normally distributed outcomes. Biometrics 76(1), 197–209 (2020)
Si, J., Yang, L., Lu, C., Sun, J., Mei, S.: Approximate dynamic programming for continuous state and control problems. In: IEEE, 17th Mediterranean Conference on Control and Automation 1415–1420 (2009)
Mavrogonatou, L., Sun, Y., Robertson, D.S., Villar, S.S.: A comparison of allocation strategies for optimising clinical trial designs under variance heterogeneity. Comp. Stat. Data. Anal. 176, 107559 (2022)
Kendall, M.G.: The advanced theory of statistics (1946)
Atkinson, A.C., Biswas, A.: Randomised response-adaptive designs in clinical trials. Monographs Stat. Appl. Probability 130, 130 (2013)
Zhu, H., Hu, F.: Implementing optimal allocation for sequential continuous responses with multiple treatments. J. Stat. Plan. Infer. 139(7), 2420–2430 (2009)
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., (N.S.) (1952)
Bellman, R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci. U.S.A. 38, 716–719 (1952)
Bellman, R.: A problem in the sequential design of experiments. Sankhyā 16, 221–229 (1956)
Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. Colloq. Math. Soc. János Bolyai 9, 241–266 (1974)
Gittins, J., Glazebrook, K., Weber, R.: Multi-armed bandit allocation indices. John Wiley & Sons (2011)
Whittle, P.: Restless bandits: activity allocation in a changing world. J. Appl. Prob., 287–298 (1988)
Miller Jr., R.G.: What price kaplan-meier?. Biometrics, 1077–1081 (1983)
Acknowledgment
JKH thanks the University of Cambridge MRC Biostatistics Unit Summer Internship Program for the support and training that made this research possible. LM and SSV acknowledge funding and support from the UK Medical Research Council (MC_UU_00002/15).
Author information
Authors and Affiliations
Contributions
LM and SV defined the internship project proposal that led to this work; LM supervised the internship work. JKH and LM designed the simulation studies; JKH performed the simulations and analysed results; JKH drafted the manuscript; JKH, LM and SV contributed to the writing and editing of the manuscript.
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Rights Retention Statement
For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
He, J.K., Villar, S.S., Mavrogonatou, L. (2023). Computing the Performance of a New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-37717-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37716-7
Online ISBN: 978-3-031-37717-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)