1 Introduction

The spread of Covid-19 has temporarily prevented experimental subjects from physically entering labs. Still, the experimental approach remains a crucial tool to understand individual and group behavior. To overcome the problems raised by physical distancing, researchers have turned to online experiments and surveys employing different platforms. The validity of these protocols has been demonstrated by successfully replicating a series of classic experiments (Crump et al. 2013; Amir et al. 2012; Horton et al. 2013). Moreover, a recent strand of research focuses on the comparison of quality of data and reliability of results from different platforms and pools of subjects (see, e.g., Gupta et al. 2021; Peer et al. 2021; Litman et al. 2021).

Online experiments feature differences from physical ones that limit the benefits of fundamental aspects of the traditional experimental methods. A first issue concerns subjects dropping out during the experiment: dropouts are problematic both because they may result in (expensive) losses due to discarding observations and because they might be endogenous (Arechar et al. 2018). A second issue concerns participants’ limited attention, which could hinder the understanding of instructions, due to limited control: Chandler et al. (2014) show that subjects could engage in other activities while participating in an online experiment (e.g. watching TV, listening to music, chatting, etc.). A third issue concerns the difficulty to control the recruiting process.

During the pandemic, we developed a novel online protocol which replicates the main features of physical experiments and therefore addresses the most relevant problems mentioned above (see Buso et al. 2020). In particular, it ensures: (i) isolated and monitored subjects, (ii) interactions mediated by computers, (iii) anonymity of participants, (iv) immediate monetary reward, and (v) the same recruiting process as in the physical lab, which allows for a better control and ensures that participants are drawn from the standard sample.

To contribute to the current debate comparing web experimental datasets and those collected in the traditional physical lab, in October 2021 we collected data on three standard games (Ultimatum, Dictator and Public Good Game) in traditional physical lab sessions and in two types of online sessions, with and without video monitoring of participants. The different settings in data collection identify our three treatments, which hereinafter are referred to as Physical Lab; Online, monitoring; Online, no monitoring. We find that participants in our experiment behave in a similar way across settings and that there are at best weakly significant and quantitatively small differences in choice data between sessions online and in the physical lab. Therefore, we confirm the validity of our protocol for online experiments and its capability to overcome the aforementioned issues.

The paper is organized as follows: in Sect. 2, we present our online protocol; in Sect. 3, we describe the experimental design; in Sect. 4, we present the results, comparing online and physical lab evidence; we conclude in Sect. 5. In the supplementary online materials, we report the translated instructions (online Appendix A), the instructions in the original language (online Appendix B) and post-experimental questionnaire (online Appendix C), together with additional material regarding our protocol (online Appendix D and online Appendix E).

2 Experimental protocol

The online visually monitored sessions are organized as follows: we adopt an architecture of connected platforms, specifically ORSEE for recruitment (Greiner 2015), Cisco WebEx for (visual) monitoring, oTree (Chen et al. 2016) for running the experiment, and PayPal for payments. In the invitation (see online Appendix D), we remind participants that a PayPal account is necessary to participate and receive the final payment. For privacy reasons, participants are informed that during the experiment they will be connected, but not recorded, via audio/video with the experimenter during the whole session and, therefore, that they need a suitable device. Participants are also informed that supernumerary subjects will be paid only the show-up fee. Before the beginning of the session, the experimenter randomly allocates registered participants to individual virtual cubicles created using Cisco WebEx, sending them the corresponding link. During the experiment, participants are monitored via webcam and can communicate, via chat and microphone, privately with the experimenter. They cannot see nor talk to each other while the experimenter can talk publicly to all participants. A picture of the experimenter’s screen is provided in online Appendix E. As participants log in to Cisco WebEx, the experimenter checks that their webcam and microphone work properly, as well as the overall quality of their internet connection. After completing these checks, the experimenters communicate to participants the access procedure and send them individual and anonymous oTree links. After log-in, participants input their PayPal account in oTree, which will be used for payments.Footnote 1 As soon as all participants are ready, the experimenter plays a prerecorded audio file with instructions read aloud for all participants, which preserves common awareness and reduces session effects. Written instructions are also displayed on participants’ screens while the audio recording is playing and remain available, by clicking a dedicated button, during the whole experiment. At the end of the session, participants answer a final questionnaire. Once participants complete the questionnaire, they are shown a receipt with their payment data and leave their virtual cubicle.

The non-monitored sessions follow the same protocol, excluding video connection, but preserving the possibility for participants and experimenters to communicate via audio or the chat. The physical lab sessions follow the traditional protocol for experiments, for example as described by Weimann and Brosig-Koch (2019).

As mentioned in the introduction, we believe our protocol addresses the most common issues of online experiments: (i) reducing involuntary dropouts (since oTree links allow participants to re-join the session and continue the experiment) and voluntary ones (by constantly monitoring participants, and communicating with them through webcam and microphone); (ii) mitigating limited attention by experimenters reading instructions aloud simultaneously before the experiment begins and by reducing the participants’ engagement in other activities; (iii) controlling for participants’ characteristics via recruiting on ORSEE.Footnote 2

3 Experimental design

The experiment features three sequences with different order (between sessions) of one-shot dictator game (DG), ultimatum game (UG) and public good game (PGG), all without feedback. In DG and UG, the proposers’ endowment is 20 tokens. Subjects play using the strategy method and role reversal, indicating their offer as proposer in DG and UG, and the minimum amount they accept as receivers (i.e., the rejection threshold) in UG. In PGG, the participants’ endowment is 10 tokens and the MPCR is equal to 0.5. They contribute in groups of four and report own contribution to the public good.

Subjects are informed that, at the end of the experiment, one game is randomly selected for payment (1 token = 1 euro). In DG and in UG, each participant is randomly matched with another subject and randomly assigned to a role (either proposer or receiver). In PGG, each participant is randomly assigned to a group with other 3 subjects. This matching procedure, which is performed after participants face the three games, together with the absence of feedback information, guarantees the independence of individual choices. The experiment was programmed in oTree (Chen et al. 2016) and the subjects were paid in cash after the experiment in the physical lab and via PayPal after the online sessions.

The experiment is composed of 9 sessions run between October 15 and October 22, 2021 with a total of 183 participants, students from LUISS Guido Carli University recruited via ORSEE (Greiner 2015). In particular, we ran three sessions in the physical lab, three online sessions with visually monitored subjects, and three online sessions without visual monitoring. Moreover, for each setting we varied the order of the three games across sessions, so that in each treatment we have (i) one PGG-DG-UG session, (ii) one DG-UG-PGG session, and (iii) one UG-PGG-DG session. Sessions in the physical lab were run at CESARE Lab with 60 participants, online sessions involved 63 participants visually monitoredFootnote 3 and 60 non-monitored participants.Footnote 4

Table 1 Balance table of participants’ characteristics by treatment

Table 1 shows that the composition of the sample in the different treatments is balanced by demographic characteristics. The dummy Economics equals 1 when the participant is a student of Economics. Self-reported RA is the self-reported willingness to take risks in general on a {0,1, ...,10} scale, where 0, respectively 10 identifies a risk averse, respectively loving subject.Footnote 5 The dummy Resident equals 1 when the participant comes from the Italian region where LUISS Guido Carli University is located.Footnote 6 The dummy Center equals 1 when the participant is from the Center of Italy versus other areas. The dummy Easy equals 1 when the participant declared (s)he found the experiment easy.Footnote 7

4 Results

In this section, we discuss the experimental results with evidence from the three games. We first present some descriptive results, and then the econometric analysis.

Figure 1 reports average choices for each of the three games by treatment with confidence intervals and shows that, overall, between-treatment differences are negligible. In DG average demand amounts to 73.2% (14.64 tokens) of the pie size (20 tokens): 71.3%, i.e., 14.267 tokens, in the physical lab; 77.5%, i.e., 15.508, in online with visual monitoring and 70.5%, i.e., 14.117, in online without visual monitoring. In UG, average proposer demand amounts to 61.35% (12.27 tokens) of the pie size (20 tokens), and in particular is equal to 59.8% in the physical lab, 60.6% in online with visual monitoring and 63.75% in online without visual monitoring. The average responder rejection threshold amounts to 37.5% (7.5 tokens) of the pie size: 35.8% in the physical lab, 38.25% in online with visual monitoring and 38.4% in online without visual monitoring. In PGG, average contribution amounts to 32% (3.18 tokens) of the endowment (10 tokens), and in particular it equals 36% in the physical lab, 27.5% in online with visual monitoring and 32.3% in online without visual monitoring.

Fig. 1
figure 1

Average choices by games and treatments

To verify whether there are significant differences in choice data between online and physical lab sessions, we run two sets of regressions. In both analyses, Physical Lab and Online, no monitoring are compared with the baseline, i.e. Online, monitoring.

The first set of regressions aims at checking whether the between-treatment differences observed in Fig. 1 are statistically significant. Individual choices in each game are analysed separately via OLS regressions using treatment dummies as covariates. The results, reported in Table 2, confirm the absence of treatment effects for both UG choices. For DG demand and PGG contributions, we find weakly significant effects between the two online treatment, and the physical lab and the baseline, respectively.Footnote 8

Table 2 Results of OLS regression with individual choices as dependent variables

In the second set of OLS regressions we expand the set of covariates with dummies indicating the sequence according to which the games were played (with the sequence DG-UG-PGG used as baseline) and individuals’ characteristics. The latter include participants’ gender and age, whether they reside in the Center of Italy,Footnote 9 and self-reported risk attitude. Results are reported in Table 3 and confirm the same treatment effects observed in Table 2. Furthermore, we find a significant positive (negative) effect on the contribution when the PGG is played first (by students of Economics).

Table 3 Results of OLS regression with individual choices as dependent variables with sequence and demographic controls

5 Conclusion

We compare lab-data collected in physical and online labs using participants from the same pool of subjects to validate our lab-like methodology that ensures (visual) monitoring, common reading of instructions, and isolation of participants.

Results from UG, DG, and PGG show that there is only one weakly significant difference in one of the choices between the physical lab and the online setting with visual monitoring, but no significant differences between the physical lab and the online setting without visual monitoring. Therefore, data generated on the web with our protocol is comparable with that in the physical lab. Furthermore, we find only one weakly significant difference in one of the choices between the online setting with and without visual monitoring.Footnote 10 Overall, we have proposed a validation of our protocol which reduces the debated side-effects arising from online experiments.

Moreover, since our protocol is based on recruitment from a pool of registered students, for example, via ORSEE, it allows to control for their characteristics, such as experience, gender, or field of study. More importantly, it allows to run a new strand of experiments based on the interaction of participants located in different geographical areas, and therefore embedded in their own cultural environment, as if they were simultaneously in the same physical lab.