Abstract
Laboratory experiments have been often replaced by online experiments in the last decade. This trend has been reinforced when academic and research work based on physical interaction had to be suspended due to restrictions imposed to limit the spread of Covid-19. Therefore, data quality and results from web experiments have become an issue which is currently investigated. Are there significant differences between lab experiments and online findings? We contribute to this debate via an experiment aimed at comparing results from a novel online protocol with traditional laboratory settings, using the same pool of participants. We find that participants in our experiment behave in a similar way across settings and that there are at best weakly significant and quantitatively small differences in behavior observed using our online protocol and physical laboratory setting.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The spread of Covid-19 has temporarily prevented experimental subjects from physically entering labs. Still, the experimental approach remains a crucial tool to understand individual and group behavior. To overcome the problems raised by physical distancing, researchers have turned to online experiments and surveys employing different platforms. The validity of these protocols has been demonstrated by successfully replicating a series of classic experiments (Crump et al. 2013; Amir et al. 2012; Horton et al. 2013). Moreover, a recent strand of research focuses on the comparison of quality of data and reliability of results from different platforms and pools of subjects (see, e.g., Gupta et al. 2021; Peer et al. 2021; Litman et al. 2021).
Online experiments feature differences from physical ones that limit the benefits of fundamental aspects of the traditional experimental methods. A first issue concerns subjects dropping out during the experiment: dropouts are problematic both because they may result in (expensive) losses due to discarding observations and because they might be endogenous (Arechar et al. 2018). A second issue concerns participants’ limited attention, which could hinder the understanding of instructions, due to limited control: Chandler et al. (2014) show that subjects could engage in other activities while participating in an online experiment (e.g. watching TV, listening to music, chatting, etc.). A third issue concerns the difficulty to control the recruiting process.
During the pandemic, we developed a novel online protocol which replicates the main features of physical experiments and therefore addresses the most relevant problems mentioned above (see Buso et al. 2020). In particular, it ensures: (i) isolated and monitored subjects, (ii) interactions mediated by computers, (iii) anonymity of participants, (iv) immediate monetary reward, and (v) the same recruiting process as in the physical lab, which allows for a better control and ensures that participants are drawn from the standard sample.
To contribute to the current debate comparing web experimental datasets and those collected in the traditional physical lab, in October 2021 we collected data on three standard games (Ultimatum, Dictator and Public Good Game) in traditional physical lab sessions and in two types of online sessions, with and without video monitoring of participants. The different settings in data collection identify our three treatments, which hereinafter are referred to as Physical Lab; Online, monitoring; Online, no monitoring. We find that participants in our experiment behave in a similar way across settings and that there are at best weakly significant and quantitatively small differences in choice data between sessions online and in the physical lab. Therefore, we confirm the validity of our protocol for online experiments and its capability to overcome the aforementioned issues.
The paper is organized as follows: in Sect. 2, we present our online protocol; in Sect. 3, we describe the experimental design; in Sect. 4, we present the results, comparing online and physical lab evidence; we conclude in Sect. 5. In the supplementary online materials, we report the translated instructions (online Appendix A), the instructions in the original language (online Appendix B) and post-experimental questionnaire (online Appendix C), together with additional material regarding our protocol (online Appendix D and online Appendix E).
2 Experimental protocol
The online visually monitored sessions are organized as follows: we adopt an architecture of connected platforms, specifically ORSEE for recruitment (Greiner 2015), Cisco WebEx for (visual) monitoring, oTree (Chen et al. 2016) for running the experiment, and PayPal for payments. In the invitation (see online Appendix D), we remind participants that a PayPal account is necessary to participate and receive the final payment. For privacy reasons, participants are informed that during the experiment they will be connected, but not recorded, via audio/video with the experimenter during the whole session and, therefore, that they need a suitable device. Participants are also informed that supernumerary subjects will be paid only the show-up fee. Before the beginning of the session, the experimenter randomly allocates registered participants to individual virtual cubicles created using Cisco WebEx, sending them the corresponding link. During the experiment, participants are monitored via webcam and can communicate, via chat and microphone, privately with the experimenter. They cannot see nor talk to each other while the experimenter can talk publicly to all participants. A picture of the experimenter’s screen is provided in online Appendix E. As participants log in to Cisco WebEx, the experimenter checks that their webcam and microphone work properly, as well as the overall quality of their internet connection. After completing these checks, the experimenters communicate to participants the access procedure and send them individual and anonymous oTree links. After log-in, participants input their PayPal account in oTree, which will be used for payments.Footnote 1 As soon as all participants are ready, the experimenter plays a prerecorded audio file with instructions read aloud for all participants, which preserves common awareness and reduces session effects. Written instructions are also displayed on participants’ screens while the audio recording is playing and remain available, by clicking a dedicated button, during the whole experiment. At the end of the session, participants answer a final questionnaire. Once participants complete the questionnaire, they are shown a receipt with their payment data and leave their virtual cubicle.
The non-monitored sessions follow the same protocol, excluding video connection, but preserving the possibility for participants and experimenters to communicate via audio or the chat. The physical lab sessions follow the traditional protocol for experiments, for example as described by Weimann and Brosig-Koch (2019).
As mentioned in the introduction, we believe our protocol addresses the most common issues of online experiments: (i) reducing involuntary dropouts (since oTree links allow participants to re-join the session and continue the experiment) and voluntary ones (by constantly monitoring participants, and communicating with them through webcam and microphone); (ii) mitigating limited attention by experimenters reading instructions aloud simultaneously before the experiment begins and by reducing the participants’ engagement in other activities; (iii) controlling for participants’ characteristics via recruiting on ORSEE.Footnote 2
3 Experimental design
The experiment features three sequences with different order (between sessions) of one-shot dictator game (DG), ultimatum game (UG) and public good game (PGG), all without feedback. In DG and UG, the proposers’ endowment is 20 tokens. Subjects play using the strategy method and role reversal, indicating their offer as proposer in DG and UG, and the minimum amount they accept as receivers (i.e., the rejection threshold) in UG. In PGG, the participants’ endowment is 10 tokens and the MPCR is equal to 0.5. They contribute in groups of four and report own contribution to the public good.
Subjects are informed that, at the end of the experiment, one game is randomly selected for payment (1 token = 1 euro). In DG and in UG, each participant is randomly matched with another subject and randomly assigned to a role (either proposer or receiver). In PGG, each participant is randomly assigned to a group with other 3 subjects. This matching procedure, which is performed after participants face the three games, together with the absence of feedback information, guarantees the independence of individual choices. The experiment was programmed in oTree (Chen et al. 2016) and the subjects were paid in cash after the experiment in the physical lab and via PayPal after the online sessions.
The experiment is composed of 9 sessions run between October 15 and October 22, 2021 with a total of 183 participants, students from LUISS Guido Carli University recruited via ORSEE (Greiner 2015). In particular, we ran three sessions in the physical lab, three online sessions with visually monitored subjects, and three online sessions without visual monitoring. Moreover, for each setting we varied the order of the three games across sessions, so that in each treatment we have (i) one PGG-DG-UG session, (ii) one DG-UG-PGG session, and (iii) one UG-PGG-DG session. Sessions in the physical lab were run at CESARE Lab with 60 participants, online sessions involved 63 participants visually monitoredFootnote 3 and 60 non-monitored participants.Footnote 4
Table 1 shows that the composition of the sample in the different treatments is balanced by demographic characteristics. The dummy Economics equals 1 when the participant is a student of Economics. Self-reported RA is the self-reported willingness to take risks in general on a {0,1, ...,10} scale, where 0, respectively 10 identifies a risk averse, respectively loving subject.Footnote 5 The dummy Resident equals 1 when the participant comes from the Italian region where LUISS Guido Carli University is located.Footnote 6 The dummy Center equals 1 when the participant is from the Center of Italy versus other areas. The dummy Easy equals 1 when the participant declared (s)he found the experiment easy.Footnote 7
4 Results
In this section, we discuss the experimental results with evidence from the three games. We first present some descriptive results, and then the econometric analysis.
Figure 1 reports average choices for each of the three games by treatment with confidence intervals and shows that, overall, between-treatment differences are negligible. In DG average demand amounts to 73.2% (14.64 tokens) of the pie size (20 tokens): 71.3%, i.e., 14.267 tokens, in the physical lab; 77.5%, i.e., 15.508, in online with visual monitoring and 70.5%, i.e., 14.117, in online without visual monitoring. In UG, average proposer demand amounts to 61.35% (12.27 tokens) of the pie size (20 tokens), and in particular is equal to 59.8% in the physical lab, 60.6% in online with visual monitoring and 63.75% in online without visual monitoring. The average responder rejection threshold amounts to 37.5% (7.5 tokens) of the pie size: 35.8% in the physical lab, 38.25% in online with visual monitoring and 38.4% in online without visual monitoring. In PGG, average contribution amounts to 32% (3.18 tokens) of the endowment (10 tokens), and in particular it equals 36% in the physical lab, 27.5% in online with visual monitoring and 32.3% in online without visual monitoring.
To verify whether there are significant differences in choice data between online and physical lab sessions, we run two sets of regressions. In both analyses, Physical Lab and Online, no monitoring are compared with the baseline, i.e. Online, monitoring.
The first set of regressions aims at checking whether the between-treatment differences observed in Fig. 1 are statistically significant. Individual choices in each game are analysed separately via OLS regressions using treatment dummies as covariates. The results, reported in Table 2, confirm the absence of treatment effects for both UG choices. For DG demand and PGG contributions, we find weakly significant effects between the two online treatment, and the physical lab and the baseline, respectively.Footnote 8
In the second set of OLS regressions we expand the set of covariates with dummies indicating the sequence according to which the games were played (with the sequence DG-UG-PGG used as baseline) and individuals’ characteristics. The latter include participants’ gender and age, whether they reside in the Center of Italy,Footnote 9 and self-reported risk attitude. Results are reported in Table 3 and confirm the same treatment effects observed in Table 2. Furthermore, we find a significant positive (negative) effect on the contribution when the PGG is played first (by students of Economics).
5 Conclusion
We compare lab-data collected in physical and online labs using participants from the same pool of subjects to validate our lab-like methodology that ensures (visual) monitoring, common reading of instructions, and isolation of participants.
Results from UG, DG, and PGG show that there is only one weakly significant difference in one of the choices between the physical lab and the online setting with visual monitoring, but no significant differences between the physical lab and the online setting without visual monitoring. Therefore, data generated on the web with our protocol is comparable with that in the physical lab. Furthermore, we find only one weakly significant difference in one of the choices between the online setting with and without visual monitoring.Footnote 10 Overall, we have proposed a validation of our protocol which reduces the debated side-effects arising from online experiments.
Moreover, since our protocol is based on recruitment from a pool of registered students, for example, via ORSEE, it allows to control for their characteristics, such as experience, gender, or field of study. More importantly, it allows to run a new strand of experiments based on the interaction of participants located in different geographical areas, and therefore embedded in their own cultural environment, as if they were simultaneously in the same physical lab.
Notes
The software stores PayPal accounts in a file separate from that of the experimental decisions, so as to preserve anonymity.
Similar platforms can be easily adapted to the characteristics of the lab (e.g., IT resources and administrative constraints). For instance, Prolific could be used for payments and subject recruitment, while alternative software such as LIONESS Lab (see Giamattei et al. 2020), Veconlab (see http://veconlab.econ.virginia.edu/), zTree Unleashed (see Duch et al. 2020), could be used for the experiment.
One participant was excluded because of technical issues.
Subjects received the same invitation for both monitored and non-monitored online sessions. In both sessions they were informed that a webcam would be needed for the initial identification phase and during the experiment, and that registration implied the acceptance to be monitored. Although in non-monitored sessions subjects were asked to turn off the webcam after identification, the same invitation with the monitoring acceptance statement was sent for both sessions to avoid self-selection into monitored and non-monitored sessions.
Specifically, subjects answered the following question (see Dohmen et al. 2011): “Are you a person generally willing to face risks or do you prefer to avoid facing them? Please express one preference on a 0–10 scale, where 0 means ‘I do not want to take any risk’ and 10 means ‘I am very willing to take risks’.”
With this variable we aim at distinguishing the students who likely live with their families from those living with other students or workers.
The translated post-experimental questionnaire is reported in the supplementary online materials (online Appendix C).
Two sample t tests reveal no treatment effect for all choices when comparing the physical lab setting and the online setting without monitoring.
Results are not affected if we substitute this variable with Resident.
This result is not directly comparable to those of Gupta et al. (2021), since they compare different populations commonly employed in economic experiments.
References
Amir, O., Rand, D., & Kobi, Y. (2012). Economic games on the internet: The effect of \$1 stakes. PLoS One. https://doi.org/10.1371/journal.pone.0031461.
Arechar, A., Gätcher, S., & Molleman, L. (2018). Conducting interactive experiments online. Experimental Economics, 21(1), 99–131. https://doi.org/10.1007/s10683-017-9527-2.
Buso, I. M., De Caprariis, S., Di Cagno, D., Ferrari, L., Larocca, V., Marazzi, F., Panaccione, L., & Spadoni, L. (2020). The effects of covid-19 lockdown on fairness and cooperation: Evidence from a lablike experiment. Economics Letters, 196(C), S0165176520303487. https://EconPapers.repec.org/RePEc:eee:ecolet:v:196:y:2020:i:c:s0165176520303487
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaiveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130. https://doi.org/10.3758/s13428-013-0365-7.
Chen, D., Schonger, M., & Wickens, C. (2016). oTree-an open-source platform for laboratory, online, and field experiments. Journal of Behavioral and Experimental Finance, 9(C), 8–97. https://EconPapers.repec.org/RePEc:eee:beexfi:v:9:y:2016:i:c:p:88-97
Crump, M., McDonnell, J., & Gureckis, T. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS One, 8(3), e57410. https://doi.org/10.1371/journal.pone.0057410.
Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J., & Wagner, G. G. (2011). Individual risk attitudes: Measurement, determinants, and behavioral consequences. Journal of the European Economic Association, 9(3), 522–550. https://doi.org/10.1111/j.1542-4774.2011.01015.x.
Duch, M. L., Grossmann, M. R. P., & Lauer, T. (2020). z-Tree unleashed: A novel client-integrating architecture for conducting z-Tree experiments over the internet. Journal of Behavioral and Experimental Finance, 28(3), 100400.
Giamattei, M., Yahosseini, K. S., Gächter, S., & Molleman, L. (2020). LIONESS Lab: A free web-based platform for conducting interactive experiments online. Journal of the Economic Science Association, 6(1), 95–111. https://doi.org/10.1007/s40881-020-00087-. https://ideas.repec.org/a/spr/jesaex/v6y2020i1d10.1007_s40881-020-00087-0.html
Greiner, B. (2015). Subject pool recruitment procedures: Organizing experiments with ORSEE. Journal of the Economic Science Association, 1(1), 114–125. https://doi.org/10.1007/s40881-015-0004-4.
Gupta, N., Rigotti, L., & Wilson, A. (2021). The experimenters’ dilemma: Inferential preferences over populations. arXiv:2107.05064
Horton, J., Rand, D., & Zeckhauser, R. (2013). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425. https://doi.org/10.1007/s10683-011-9273-9.
Litman, L., Moss, A., Rosenzweig, C., & Robinson, J. (2021). Reply to MTurk, Prolific or panels? Choosing the right audience for online research. https://ssrn.com/abstract=3775075 or https://doi.org/10.2139/ssrn.3775075
Peer, E., Rothschild, D., Gordon, A., Evernden, Z., & Damer, E. (2021). Data quality of platforms and panels for online behavioral research. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01694-3.
Weimann, J., & Brosig-Koch, J. (2019). Methods in Experimental Economics. Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank the Editors, Maria Bigoni and Dirk Engelmann, and two anonymous referees for useful comments. We also thank Sofia De Caprariis for her assistance during this project.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Buso, I.M., Di Cagno, D., Ferrari, L. et al. Lab-like findings from online experiments. J Econ Sci Assoc 7, 184–193 (2021). https://doi.org/10.1007/s40881-021-00114-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40881-021-00114-8