1 Introduction

Preventing climate change and global warming are among the biggest challenges of our time. In 2015, international climate negotiators agreed in Paris to keep the rise in global mean surface temperature “well below” 2 °C. Reaching this target is thought to prevent catastrophic climate change,Footnote 1 but requires a considerable reduction in greenhouse gas emissions. This, in turn, requires collective efforts by a substantial number of countries, which is notoriously difficult since emission reductions are costly for the country undertaking the effort but beneficial for all (Nordhaus 2018).

This “collective action problem at the global scale” (IPCC 2014) is aggravated by the heterogeneity of countries, in particular of developed and developing nations. The latter tend to be more vulnerable to the consequences of climate change than the former since climate change intensifies already existing risks of extreme weather events such as floods, storms, and droughts. These risks tend to be more pronounced in developing countries, also because their economies tend to rely on climate-sensitive sectors like agriculture or fishery for food security and employment and often lack the ability to cope with climatic shocks. In addition, developing countries also dispose of fewer (financial) resources to invest into vulnerability-reducing adaption measures and can contribute less to the prevention of climate change in the first place (Stern 2007; DARA 2012; IPCC 2014). In sum, developing countries are both more vulnerable and poorer than developed countries.

These asymmetries induce conflicting perceptions of how the burden of emission reduction ought to be distributed among nations. While an equal distribution of the burden could be considered a default option, international climate negotiators early on agreed on the principle of “common but differentiated responsibilities” (United Nations 1992; Ringius et al. 2002). Frequently discussed criteria for determining such a differentiated distribution are a country’s vulnerability to the negative consequences of climate change and its ability to pay for emission reduction (Hayward 2012; Gampfer 2014). Standard burden-sharing rules yield, however, conflicting implications since wealth and vulnerability tend to come in conjunction. Burden sharing based on a country’s wealth would require higher contributions by developed countries (ability-to-pay rule), i.e. by those who are less affected by the negative consequences of climate change. Using vulnerability as a distribution criterion would entail larger contributions by developing countries (beneficiary-pays rule), i.e. by those who are more vulnerable but usually dispose of fewer resources to contribute to climate change prevention. This availability of multiple legitimate behavioral rules creates normative conflict—a situation in which agents tend to endorse distributions that favor their private interests (Bicchieri 2005). And indeed, self-serving interpretations of fair burden sharing are an important reason for the recurrent deadlock of international climate negotiations (Lange et al. 2010; Carlsson et al. 2013; Brick and Visser 2015).

This study uses the climate change game (Milinski et al. 2008) to disentangle the effects of heterogeneity in wealth and vulnerability (i) on a group’s ability to reach a predefined collective threshold, reflecting the prevention of catastrophic climate change, and (ii) on how the burden is shared conditional on the group’s composition. In the experiment the participants were matched in groups. In each round group members could individually contribute part of their private funds to a public account in order to reach the collective threshold by the end of ten rounds. If the group missed the threshold each member experienced a predefined loss. Conditional on the treatment (see Table 1), group members differed with respect to their contribution capacities (wealth) or the loss they experienced (vulnerability), or both simultaneously. The experimental design thus allowed for an investigation of the effect of group heterogeneity in none (BASE), one (VUL and WLT) or both of these dimensions (WLT-VUL) in a systematic and controlled way. By matching agents who were both poor and more vulnerable with agents who were both rich and less vulnerable in WLT-VUL, the potentially most conflicting composition regarding preferences for burden sharing was brought to the lab. Individual burden sharing preferences were elicited behind a veil of ignorance at the beginning of the experiment to assess the effects of (i) ex-ante normative disagreement on the group level and (ii) self-serving deviations from ex-ante preferences on the individual level.

Table 1 Treatment variations

The results of the experiment show that if group members differ with respect to wealth (WLT), groups are more likely to reach the threshold if the rich carry a larger share of the burden. When members differ with respect to vulnerability (VUL), groups are more successful if the more vulnerable contribute more. Interestingly, the results show that groups are most successful and efficient in case of multidimensional heterogeneity (WLT-VUL), i.e. when low wealth coincides with high vulnerability and vice versa. In this case burden sharing mirrors the one observed in WLT, with the difference that groups are significantly more likely to reach the threshold. The additional vulnerability of the poor seems to facilitate coordination by emphasizing the responsibility of the rich to assist those in need and carry a larger share of the burden.

These findings are in line with burden sharing preferences elicited behind a veil of ignorance. Before being informed about their own type, participants state that it would be fair that the rich, despite being less vulnerable, bear more of the burden in WLT-VUL. Ex-ante normative disagreement about how the burden should be shared is relatively low in homogenous groups (BASE) and groups that are heterogeneous with respect to wealth (WLT), but it is significantly larger in groups that are heterogeneous with respect to vulnerability (VUL). As a consequence, the latter coordinate on a broader variety of different burden-sharing rules than other groups (who mostly coordinate on the same burden-sharing rule) and have more difficulties in reaching the threshold. In general, groups are more likely to succeed if individuals comply with the contribution they indicated as fair behind a veil of ignorance, while self-serving deviations from ex-ante preferences are detrimental to group success.

This study adds to a small body of literature that uses the climate change game to study climate change prevention behavior in an experimental setting. Multidimensional and correlated heterogeneity in wealth and vulnerability has been studied before (Burton-Chellew et al. 2013; Gampfer 2014), however, these studies were not able to systematically disentangle the effects of wealth and vulnerability. The results of this study show that groups are surprisingly successful in solving the coordination problem that evolves when the poor are also more vulnerable, and vice versa. The double-disadvantaged position of the poor seems to facilitate coordination by making the responsibility of the rich more salient. In the experiment, the rich readily assume this responsibility, resulting in an impressive success rate of the respective groups.

The remainder of this paper is organized as follows: Sect. 2 gives an overview of the previous literature, Sect. 3 outlines the experimental design and Sect. 4 the theoretical expectations. The results are presented in Sect. 5 and discussed in Sect. 6.

2 Previous Research

This study builds on and complements a slim experimental literature using the so-called climate change game developed by Milinski et al. (2008) to study burden sharing in heterogeneous groups. It focuses on two of the main asymmetries and sources of conflict between climate negotiators, wealth and vulnerability. So far, these two dimensions have not been studied systematically in a 2 × 2 design. Such a design allows isolating how heterogeneity in wealth and/or vulnerability affect burden sharing and the likelihood that groups reach the threshold, conditional on the type of heterogeneity.

The climate change game is a variant of a threshold public good game (TPGG). It was developed by Milinski et al. (2008) as a means to study experimentally the social dilemma that comes with the prevention of climate change.Footnote 2 In the game, group members can contribute their private funds either to a public account or keep them in their private account. After ten rounds of play, the sum of individual contributions to the public account has to exceed a certain threshold, otherwise each group member loses her private funds with a predetermined probability. Individual contributions are considered to reflect emission reduction efforts and the loss of private funds the costs of catastrophic climate change. Coordinating on reaching the threshold is in the collective interest of the group. This collective effort can, however, be thwarted by group members choosing to free-ride. In Milinski et al. (2008) group members are homogenous with respect to wealth and vulnerability, which turns the game into a coordination game with equal burden sharing as the normatively appealing and hence focal burden-sharing rule. In reality, agents are, however, heterogeneous and find different burden-sharing rules normatively appealing, which makes coordination more difficult.

The climate change game has been used before to study the effect of heterogeneity in wealth. Both Tavoni et al. (2011) and Burton-Chellew et al. (2013) found that heterogeneous groups are less likely to reach the threshold and if they do, the rich tend to carry a larger share of the burden than the poor. These studies provide different explanations for the contribution behavior of the rich. While Tavoni et al. (2011) explain the larger contributions with a concern for fairness of the rich, Burton-Chellew et al. (2013) argue that they contribute more since they have relatively more at stake if the threshold is missed. These findings are not surprising given the previous evidence from TPGGs, in which wealth heterogeneity was repeatedly found to have a negative effect on the provision of the public good (e.g. Rapoport and Suleiman 1993; Bagnoli and McKee 1991; Bernard et al. 2014).

Contrary to heterogeneity in wealth heterogeneity in vulnerability has not been addressed systematically in the climate change game. To the best of my knowledge, only Burton-Chellew et al. (2013) varied the group members’ vulnerability, albeit not in a ceteris paribus manner. As a consequence, an isolation of the effect was not possible. In comparable PGGs and TPGGs, in which subjects benefited differently from the provision of the public good, subjects were found to support conflicting fairness principles, which was detrimental to group success (e.g. Reuben and Riedl 2013; Nikiforakis et al. 2012; Bagnoli and McKee 1991; Bernard et al. 2014).

If heterogeneity in one dimension is already detrimental to group success, what are the consequences if agents are heterogeneous in both wealth and vulnerability? Again, to the best of my knowledge, only Burton-Chellew et al. (2013) have addressed this question before in a climate change game. They found that groups are more likely to reach the threshold if the rich face a higher risk than the poor. In this case the rich are willing to carry a larger share of the burden. On the other hand, if the poor face a higher risk, the rich reduce their contributions and groups are more likely to fail. Gampfer (2014) observed similar patterns in an ultimatum game setting. In both studies the weaker bargaining position of the poor seemed to mitigate fairness considerations of the rich and induce a more self-serving behavior which was detrimental to group success. In support of this interpretation, Brick and Visser (2015) provide evidence that the adverse effect of heterogeneity is due to a self-serving application of fairness principles. In their variant of a climate change game, groups were more likely to fail if subjects decided as representatives of their country, which were informed about their own and others’ wealth and vulnerability, rather than behind a veil of ignorance which precluded self-serving behavior.

3 Experimental Design

Subjects played ten rounds of a climate change game in groups of four. At the beginning of each round, each subject received a certain number of tokens as operating funds \({o}_{i}\) and could decide to contribute any integer amount of these funds to a public account. The remaining tokens were allocated to her private account. Subjects liquidity in each round was constrained to \({o}_{i}\), i.e. saving for future rounds and taking loans on funds to be received in future rounds were precluded by design. In any given round, \({y}_{i}\) denotes an individual \(i\)’s contribution and \({Y}_{j}\) a group’s collective contribution to the public account. Until the end of round 10, ∑\({Y}_{j}\) had to exceed the threshold \(T\) of 80 tokens, otherwise each group member lost a certain amount of tokens \({L}_{i}\) from her private account. The size of \({o}_{i}\) and \({L}_{i}\) was a function of a subject’s type and was common knowledge. To reach the threshold \(T\), contributions of more than one subject were required since the total amount of operating funds an individual received across all rounds (\({O}_{i}\)) was insufficient to reach the threshold (i.e. \({O}_{i}<T\) for all types). At the beginning of the experimental session, each subject received an additional endowment \(E\) of 40 tokens in her private account which was not available for investment but at stake if \(T\) was not reached. The level of \(E\) was set such that negative payoffs were precluded even if a subject spent its entire operating funds and experienced the maximum loss.

In successful groups, i.e. if a group’s joint contribution to the public account \(\sum {Y}_{j}\) exceeded \(T\) after ten rounds, an individual \(i\)’s payoff \({\pi }_{i}\) was

$$ \pi_{i} = E + O_{i} - Y_{i} \quad {\text{if}} \sum Y_{j} \ge T $$
(1)

where \({Y}_{i}\) denotes the sum of contributions of individual \(i\) across ten rounds.

In failing groups, i.e. if the group’s contributions \(\sum {Y}_{j}\) did not exceed \(T\) after ten rounds, group members experienced a loss and an individual \(i\)’s payoff \({\pi }_{i}\) was

$$ \pi_{i} = E + O_{i} - Y_{i} - L_{i } \quad {\text{if }}\sum Y_{j} < T $$
(2)

Unlike in linear one-shot PGGs subjects did not receive a marginal per capita return (\(mpcr\)) from contributions to the public account. Instead, subjects benefited indirectly from contributing to the public account by experiencing no loss if the threshold was reached.

The primary interest of this study is to disclose differences in the behavior of individuals and groups when heterogeneities are multidimensional compared to when they are one-dimensional or absent. Therefore, four different treatments were implemented (see Table 1) using a between-subjects design and partner matching. In the baseline treatment (BASE) subjects were homogenous with respect to wealth and vulnerability. All subjects received 4 tokens per round as operating funds (\({o}_{i}=4\)) which could be invested in the public account. If the group did not reach the threshold by round 10, each subject lost 30 tokens (\({L}_{i}=30\)). In the vulnerability treatment (VUL) subjects were still equally wealthy (\({o}_{i}=4\)) but differed in their degree of vulnerability. Two subjects were more vulnerable and experienced a loss of 40 tokens if the threshold was not reached (\({L}_{i}=40\)), the other two subjects were less vulnerable and lost only 20 tokens (\({L}_{i}=20\)). In the wealth treatment (WLT) subjects had an equal degree of vulnerability (\({L}_{i}=30\)) but differed with respect to their wealth. Two subjects were rich and received 5 tokens per round as operating funds (\({o}_{i}=5\)), while the other two poor subjects received only 3 tokens per round (\({o}_{i}=3\)). In the wealth and vulnerability treatment (WLT-VUL) subjects were heterogeneous with respect to both dimensions. More precisely, both poor subjects were more vulnerable (\({{o}_{i}=3, L}_{i}=40)\), while both rich subjects were less vulnerable (\({o}_{i}=5,{ L}_{i}=20\)). Matching rich/less vulnerable and poor/more vulnerable subjects is of particular interest. It not only reflects reality better than the other treatments, but there is also good reason to expect the existence of normative conflict among group members since it is not a priori clear which criterion, wealth or vulnerability, should be used to determine burden sharing (see Sect. 4.3).Footnote 3 The total wealth of the group and the total potential loss for the group were kept constant across treatments (\(\sum {O}_{i}=\) 160; \(\sum {L}_{i}=\) 120) to rule out that efficiency concerns drive the results.Footnote 4 This implies that \({o}_{i}\) and \({L}_{i}\) were set at the average value whenever subjects were homogenous with respect to the respective dimension. For an overview of treatment parameters see Table 2.

Table 2 Expected payoffs by treatment and type conditional on equilibrium and burden-sharing rule

Subjects had complete information during the entire session. The types and, consequently, wealth and vulnerability of each group member were common knowledge. At the end of each round a table was displayed showing the group’s contribution as well as an individual breakdown of the group members’ contributions in the current round and in total. Pseudonyms were used to share this information.

At the very beginning of the experiment subjects’ burden sharing preferences were elicited behind a veil of ignorance. Subjects were asked to indicate for each group member the average contribution per round they would consider fair. For this purpose, they were informed about the type of each group member (but not their own member ID or type) and could specify a tokens contribution between 0 and \({o}_{i}\) as fair for the respective subject. In addition, subjects were asked to indicate on a scale from 1 (extremely unlikely) to 7 (extremely likely) how likely they think it is that the threshold will be reached by round 10. This question was repeated at the beginning of each round, jointly with a question asking subjects to assess the number of tokens they expect the group as a whole to contribute in the current round. Following Gächter and Renner (2010), beliefs were not incentivized in order to prevent their elicitation influencing contribution behavior. Since beliefs were elicited in the same way across all treatments I have no reason to assume that this procedure explains any treatment effects.

To facilitate coordination, subjects were provided with a simple communication device. Following the procedure in Tavoni et al. (2011), subjects had to pledge simultaneously and publicly at the end of round 3 and round 7 which amount they intended to contribute in the remaining rounds of the game.Footnote 5 Combining the information about intended future contributions of the other group members with the information about cumulative contributions in the past allowed subjects to assess whether the intended contributions would be enough to reach the threshold or whether adjustments were needed. That is, although pledges were non-binding, they were nevertheless not only “cheap talk” (Bochet et al. 2006) since they facilitated coordination. Again, the procedure was the same across treatments. Further analyses with respect to the pledge mechanism are provided in Appendix 1.

3.1 Procedures

The experiment was programmed using z-tree (Fischbacher 2007) and subjects were recruited via ORSEE (Greiner 2015). After entering the laboratory subjects were assigned randomly to a treatment and a group. The treatment number was displayed on the first screen allowing the lab assistants to hand out the corresponding instructions to each participant (instructions are included in Appendix 2). Before the start of the experiment subjects were provided with a series of control questions that would help them to understand the game. After ten rounds of play, the sum of tokens in the possession of each participant was converted into Euro (4 tokens = €1) and paid to the participant in private before leaving the lab. After completing the experiment and before payment, participants were asked to complete a questionnaire including questions regarding their satisfaction with the outcome of the game, a cognitive reflection test, several socio-demographic questions as well as questions from the social preference module developed by Falk et al. (2016) to assess, for example, their risk taking and reciprocity. Participants received an additional payment of €5 for completing the post-experimental questionnaire. Neutral language was used throughout the experiment.

4 Equilibria and Expectations

4.1 Equilibria and Best Responses

The climate change game is a variant of a n-person cumulative TPGG played for 10 rounds. Contrary to standard TPGGs, as implemented for example by Bagnoli and McKee (1991), Rapoport and Suleiman (1993) or Croson and Marks (2000), the collective target is to avoid a collective bad rather than creating a public good. Contributions happen simultaneously in each round, but there is an element of sequentiality since subjects learn others’ contributions after each round. Contributions to the public account are not returned if the threshold is missed (no refund), neither are contributions above the threshold (no rebate of over-contributions).Footnote 6

Contrary to standard PGGs, which have a unique Nash equilibrium at zero contributions, the existence of a threshold turns the game into a coordination game that is characterized by two sets of equilibria. One is the free-rider equilibrium where all subjects contribute zero in each round. Assuming rational and selfish agents, this equilibrium is an inefficient, pure-strategy Nash equilibrium since free-riding is the individually best response if subjects expect everybody else to free-ride. There is only one pattern of contributions leading to this equilibrium, namely zero total contributions by all. The second focal contribution level is the threshold equilibrium where subjects collectively contribute just enough for the threshold to be reached but not passed, i.e. the allocation of funds is sufficient and efficient. There is a continuum of contribution patterns leading to this equilibrium, which distributes the contribution burden in different ways among the group members. Each of these patterns is a Nash equilibrium if it satisfies the following two constraints: (1) the sum of contributions is exactly equal to the threshold (efficiency constraint: \(\sum {Y}_{j}=T\)), and (2) no group member contributes more than the amount she would lose otherwise (individual rationality constraint: \(\sum {Y}_{i}\le {L}_{i} \forall i\)) (Croson and Marks 2000). Contributing to the public account is collectively rational since the efficient threshold equilibrium pareto-dominates the inefficient free-rider equilibrium with collective payoffs equal to 240 and 200 tokens, respectively.

A profound theoretical analysis of the game is beyond the scope of this experimental study, but a few general comments can be made regarding individual best-response behavior. A rational and selfish agent should only contribute to the public account if and as long as she perceives her contributions to be critical to reach the threshold. Once the threshold cannot be reached anymore due to insufficient collective contributions in previous rounds, she should stop contributing. Similarly, if total contributions are about to exceed the threshold, a subject's best response is to reduce her own contributions as long as she expects the threshold to still be reached. This includes some strategic uncertainty since miscoordination and thus the waste of all previous contributions is possible (Bagnoli and McKee 1991; Croson and Marks 2000; Dannenberg et al. 2015). Note that in the final rounds of the game it might be optimal for a subject to contribute more than her ex-ante willingness to pay, such that \({Y}_{i}>{L}_{i}\). The reason is that, rationally, a subject should treat her contributions in previous rounds as sunk cost. She should assess in each round anew whether her contribution is critical to reach the threshold and base this decision on the information she receives at the end of each round about the group’s cumulative contribution so far and her beliefs about the future contributions of the other group members.

4.2 Equilibrium Selection: The Role of Fairness Preferences

Given the large number of contribution patterns leading to the efficient threshold equilibrium, other selection criteria are needed to predict more precisely which contribution patterns are likely to be observed. Fairness principles are useful indicators to identify contribution patterns that serve as focal points and facilitate coordination (Schelling 1960). Considering the notions of equality and equity, three salient burden-sharing rules can be identified in the experimental setting at hand. The equal burden-sharing rule requires equal contributions by all players. It reflects the principle of equality with respect to contributions and, in case of BASE, also with respect to outcomes. The ability-to-pay rule requires individual contributions to be proportional to wealth and hence respects the principle of equity but also of equality in outcomes. The third salient rule is the beneficiary-pays rule,Footnote 7 which suggests that those who are expected to benefit more, i.e. lose more when the threshold is missed, contribute relatively more to the public account. In this case the principle of equity is respected, but there is neither equality in contributions nor outcomes (Ringius et al. 2002; Konow 2003; Bernard et al. 2014).

Table 2 gives an overview of individual total contributions \({Y}_{i}\) and the expected payoffs \({\pi }_{i}\) for all treatments and types conditional on the equilibrium (free-rider equilibrium or threshold equilibrium) and the burden-sharing rule (equal burden sharing, ability-to-pay or beneficiary-pays rule). Note that the loss \({L}_{i}\) also reflects the maximum total amount subjects are ex-ante willing to contribute to the public account if they expect to be critical for the threshold being reached, taking into consideration the efficiency constraint and the individual rationality constraint. At this contribution level subjects are ex-ante indifferent between free-riding and contributing the respective number of tokens.

The identification of normatively appealing burden-sharing rules narrows down the set of contribution patterns that are likely to be chosen by subjects (Reuben and Riedl 2013). There exists, however, a variety of legitimate burden-sharing rules in all treatments except BASE. This provides subjects with the flexibility to choose the burden-sharing rule that requires them to contribute the lowest amount, while still being able to justify their relatively low contributions as being fair. Such a self-serving interpretation of fairness principles allows subjects to reduce the tension between their conflicting desires for utility maximization and fairness (Konow 2000). Unfortunately, though, coordination will fail if all subjects interpret the fairness argument in their own favor. Burden-sharing rules might, nevertheless, facilitate coordination if subjects are motivated by inequality aversion and willing to contribute more than their self-interested fair share.

4.3 Expectations

The following expectations regarding the outcomes of the experiment are based on the previous literature and the theoretical thoughts presented above.

4.3.1 Threshold Achievement

Homogenous groups dispose of a unique focal burden-sharing rule, while a set of legitimate burden-sharing rules is available to members of heterogeneous groups. I therefore expect heterogeneous groups (VUL, WLT, WLT-VUL) to be less likely to reach the threshold than homogenous groups (BASE). Assuming self-regarding utility-maximization, agents’ self-serving burden sharing preferences are the most conflicting in WLT-VUL (see below) and I therefore expect to observe the lowest share of groups to reach the threshold in this treatment. If the rich/less vulnerable are, however, motivated by inequality aversion, they might be willing to compensate for the lower wealth of the poor, resulting in a larger share of successful groups in WLT-VUL.

4.3.2 Burden Sharing Preferences

Burden sharing preferences are elicited behind a veil of ignorance at the beginning of the experiment. Since subjects are ignorant of their type at this point in time, ex-ante preferences should not be biased by self-interest. I therefore expect most subjects to reveal preferences in line with one of the salient burden-sharing rules identified above, conditional on their preferences for equality or equity: the equal burden-sharing rule in BASE; the ability-to-pay rule and the equal burden-sharing rule in WLT; the beneficiary-pays rule or the equal burden-sharing rule in VUL. In WLT-VUL I expect subjects to reveal preferences for a larger variety of different burden-sharing rules, since both wealth and vulnerability can serve as reference points for those with a preference for equity, and the equal burden-sharing rule is still appealing to those with a preference for equality.

4.3.3 Individual Contribution Behavior

Under the assumption of self-regarding utility-maximization, I expect to observe contributions in line with the following burden-sharing rules (see also Table 2): In VUL, less vulnerable subjects are expected to contribute according to the beneficiary-pays rule, more vulnerable subjects in line with the equal burden-sharing rule. If subjects differ with respect to wealth (WLT), poor subjects are expected to follow the ability-to-pay rule, while rich subjects are more likely to adhere to the equal burden-sharing rule. In WLT-VUL, the self-interested preferences of both types are even more conflicting: The rich/less vulnerable are expected to contribute in line with the beneficiary-pays rule, the poor/more vulnerable in line with the ability-to-pay rule. As a consequence, assuming self-regarding utility-maximization, I expect the lowest average individual contributions in WLT-VUL.

Contributions might deviate from these expectations if the assumption of self-regarding utility-maximization is relaxed. If subjects are motivated by inequality aversion, I expect the rich in WLT to contribute more and compensate for the lower wealth of the poor; in VUL, the less vulnerable might be willing to share the burden equally in order to avoid inequality in earnings; given the perfect correlation of wealth and vulnerability in WLT-VUL, I also expect the rich/less vulnerable to carry a larger share of the burden than predicted under the assumption of self-regarding utility-maximization.

4.3.4 Burden Sharing in Groups

While fairness rules can be a source of conflict if interpreted in a self-interested way, they can also be the basis of agreement if group members manage to coordinate on one of the rules. Based on the expectations regarding burden sharing preferences, I expect that groups are most likely to succeed in reaching the threshold if they coordinate on equal burden sharing in BASE, the ability-to-pay rule in WLT and the beneficiary-pays rule in VUL. Coordination is expected to be more difficult in WLT-VUL. Implementing the ability-to-pay rule is no equilibrium in this treatment unless subjects are motivated by inequality aversion since it would require a level of contributions of the rich that exceeds their maximum willingness to pay (see Table 2). The question is thus whether the rich/less vulnerable crowd out or whether they are motivated by inequality aversion and feel obliged to assist the poor/more vulnerable with higher contributions.

5 Results

The experiment was implemented in September and October 2016 in the VCEE lab in Vienna. In total, 168 subjects participated in 42 groups: ten groups of four in both BASE and WLT-VUL, eleven groups of four in both VUL and WLT. Among the participants were slightly more women than men (57% and 43%, respectively) and subjects were on average 24 years old. Subjects were mostly students (99%) from a broad range of studies offered at the University of Vienna, one fifth (20%) of them studied economics or business. An overview of basic descriptive statistics is provided in Appendix 3. The experiment lasted approximately 75 min and subjects earned on average €18.8, with a minimum of €10 and a maximum of €24.

The presentation of the results starts with an analysis of threshold achievement in Sect. 5.1. In order to gain a better understanding of the observed treatment differences, ex-ante burden sharing preferences are analyzed in Sect. 5.2, actual burden sharing in Sect. 5.3, and potential self-serving biases in individual contribution behavior in Sect. 5.4. Section 5.5 looks at the dynamics across rounds and Sect. 5.6 finally examines efficiency and earnings.

5.1 Threshold Achievement

Groups were in general quite successful in avoiding a collective loss: 79% of all groups managed to reach the threshold of 80 tokens by round 10. Groups were most likely to miss the threshold in the treatments with one-dimensional heterogeneity: 36% in VUL and 27% in WLT did not reach the threshold. Success rates in these treatments were lower than in BASE, where only 20% of the groups failed, but the differences are not significant (Wilcoxon rank-sum tests, p = 0.099 and p = 0.437, respectively). These findings deviate at a first glance from those of previous studies, which found that heterogeneous groups in general perform worse than homogenous groups (e.g. Bernard et al. 2014; Burton-Chellew et al. 2013). The reason could be that these studies operationalized vulnerability in a different way (as losing everything with a certain probability, see Sect. 3) and did not provide subjects with the possibility to communicate. As has been shown by Tavoni et al. (2011), introducing a simple pledge mechanism, as used in this experiment, can eliminate the adverse effect of heterogeneity on group success.

Due to their multidimensional heterogeneity, groups in WLT-VUL were expected to have the most difficulties to coordinate. Surprisingly, all groups in WLT-VUL managed to reach the threshold and were thus significantly more likely to do so than in any other treatment (Wilcoxon rank-sum tests, p = 0.003 for WLT-VUL vs. BASE, p = 0.000 for both WLT-VUL vs. VUL and WLT-VUL vs. WLT). The following analyses shed light on the preferences and behavior underlying the observed treatment differences.

5.2 Burden Sharing Preferences

Individual burden sharing preferences were elicited behind a veil of ignorance by asking subjects to indicate the average contribution per round they would consider fair for each member (\({\tilde{y }}_{j}\) for all \(j\)). Since subjects knew the type of each group member but not their own type, these ex-ante preferences should not be biased by self-serving considerations. Subjects’ answers were classified in four mutually exclusive categories according to the following procedure: if subjects allocated in total higher contributions to the rich types, i.e. \(\sum {\stackrel{\sim }{\mathrm{y}}}_{r1+r2}>\sum {\stackrel{\sim }{\mathrm{y}}}_{p1+p2}\), and to each rich type at least as many tokens as to each poor type, i.e. \(min\left({\stackrel{\sim }{\mathrm{y}}}_{r1,} {\stackrel{\sim }{\mathrm{y}}}_{r2}\right)\ge max\left({\stackrel{\sim }{\mathrm{y}}}_{p1},{\stackrel{\sim }{\mathrm{y}}}_{p2}\right),\) they were classified as preferring the ability-to-pay rule; if they assigned in total more tokens to the more vulnerable types (\(\sum {\stackrel{\sim }{\mathrm{y}}}_{mv1+mv2}>\sum {\stackrel{\sim }{\mathrm{y}}}_{lv1+lv2}\)) and at least as many tokens to each more vulnerable subject than to each less vulnerable subject (\(min\left({\stackrel{\sim }{\mathrm{y}}}_{mv1,}{ \stackrel{\sim }{\mathrm{y}}}_{mv2}\right)\ge max\left({\stackrel{\sim }{\mathrm{y}}}_{lv1},{\stackrel{\sim }{\mathrm{y}}}_{lv2}\right)\)), they were classified as preferring the beneficiary-pays rule; subjects who considered equal contributions to be fair were classified as preferring the equal burden-sharing rule; if subjects’ allocation did not correspond to any of these rules, they were classified as other.Footnote 8 Importantly, if subjects assigned a sum of contributions insufficient to reach the threshold, they were classified as such.

Figure 1 shows how much support the different burden-sharing rules received in each treatment. Overall, 89.9% of subjects supported one of the three burden-sharing rules specified in the theoretical part. There were, however, significant differences between treatments (\({\chi }^{2}\)-test, p = 0.000). Subjects in BASE expressed a clear preference for the equal burden-sharing rule (92.5%). Equal burden sharing was also the most popular allocation in VUL (65.9%), but found only little support in WLT (9.1%) and WLT-VUL (12.5%). In both treatments with wealth heterogeneity, the ability-to-pay rule was the preferred allocation with 81.8% in WLT and 80.0% in WLT-VUL. The beneficiary-pays rule found some support in VUL (15.9%), but hardly any in WLT-VUL (2.5%).

Fig. 1
figure 1

Individual preferences for burden sharing behind a veil of ignorance

The clear preference for the ability-to-pay rule over the beneficiary-pays rule in WLT-VUL is remarkable. The results of WLT and VUL show that if heterogeneity existed in only one of the dimensions, subjects supported very different burden-sharing rules conditional on the type of heterogeneity. As a consequence one would expect to observe much more diversity in subjects’ burden sharing preferences in WLT-VUL or, alternatively, that subjects choose equal burden sharing as a compromise. However, this is not the case and subjects clearly consider the wealth dimension to be the more valid criterion for burden sharing.

5.2.1 Normative Disagreement

Following Bernard et al. (2014), I assessed the level of normative disagreement by calculating the number of different burden-sharing rules that was supported by the members of a group. Normative disagreement was lowest in BASE with groups supporting on average 1.3 different burden-sharing rules. There was slightly more disagreement in WLT and WLT-VUL, with groups supporting on average 1.64 and 1.6 different rules, respectively. The significantly highest level of normative disagreement was observed in VUL with an average of 2.1 (Wilcoxon rank-sum test, p = 0.000 with respect to all other treatments). Since VUL is also the treatment with the lowest success rate, a plausible hypothesis would be that normative disagreement hindered groups from coordinating successfully. However, the correlation between normative disagreement and success rate is only weak and insignificant (Spearman’s r = −0.054, p = 0.488) and does not support this hypothesis. This is in line with Bernard et al. (2014), who argue that subjects might not have very strong intrinsic preferences for a certain burden-sharing rule but rather consider them as useful focal points and are willing to compromise on their initially preferred allocation.

In the following section I analyze which burden-sharing rules groups finally coordinate on and whether the distribution mirrors the one of ex-ante preferences.

5.3 Burden Sharing in Groups

A similar procedure as in Sect. 5.2 is used to categorize actual burden sharing on the group level. Groups were classified as following the ability-to-pay rule if (i) the sum of contributions of both rich types was larger than the sum of contributions of both poor types, i.e. \(\sum {Y}_{r1+r2}>\sum {Y}_{p1+p2}\), (ii) the smaller sum of contributions of a rich type was at least as high as the larger sum of contributions of a poor type, i.e. \(min\left({Y}_{r1,}{Y}_{r2}\right)\ge max\left({Y}_{p1},{Y}_{p2}\right),\) and (iii) the range between the average contributions of the rich and the poor was larger than 10 tokens, i.e. \(\left(\frac{\sum {Y}_{r1+r2}}{2}-\frac{\sum {Y}_{p1+p2}}{2}\right)>10\). The last condition guarantees that the rich contributed substantially more than the poor and burden sharing is thus better captured by the ability-to-pay rule than the equal burden-sharing rule. Similarly, groups were classified as following the beneficiary-pays rule if (i) both more vulnerable types contributed in total more than both less vulnerable types, i.e. \(\sum {Y}_{mv1+mv2}>\sum {Y}_{lv1+lv2}\), (ii) the smaller sum of contributions of a more vulnerable type was at least as high as the larger sum of contributions of a less vulnerable type, i.e. \(min\left({Y}_{mv1,}{Y}_{mv2}\right)\ge max\left({Y}_{lv1},{Y}_{lv2}\right),\) and (iii) the range between the average contributions of the more vulnerable and the less vulnerable was larger than 10 tokens, i.e. \(\left(\frac{\sum {Y}_{mv1+mv2}}{2}-\frac{\sum {Y}_{lv1+lv2}}{2}\right)>10\). If the observed burden sharing pattern did not conform to any one of these rules and contributions of all members did not deviate by more than 25% from the equal contribution level of 20 tokens, i.e. ranged between a minimum of 15 and a maximum of 25 tokens, groups were classified as following the equal burden-sharing rule. Groups that missed the threshold were classified as failed since no final statement can be made about burden sharing in these groups. All remaining groups were classified as other.Footnote 9

Figure 2 shows for each treatment the share of groups following the different burden-sharing rules. Just as with burden sharing preferences, there are significant treatment differences with respect to the implemented burden-sharing rules (\({\chi }^{2}\)-test, p = 0.000). In addition, several observations can be made when comparing the burden sharing preferences depicted in Fig. 1 and the actual burden sharing depicted in Fig. 2. Remember that subjects were not informed about the burden sharing preferences of the other group members and could hence not use this information to coordinate their contributions. First, the strong preference for the ability-to-pay rule in WLT and WLT-VUL directly translates into an impressive share of groups actually implementing this rule. In 74.8% of the successful groups in WLT (54.4% of all groups) and in 80% of the groups in WLT-VUL the rich carry a larger share of the burden than the poor. Second, in VUL the beneficiary-pays rule gained quite some popularity, with 42.9% of successful groups (27.3% of all groups) implementing this rule. Equally surprising is the fact that in this treatment only one group shared the burden equally, having in mind the strong preference for this rule subjects expressed ex-ante. One explanation for this outcome could be the self-serving behavior of the less vulnerable, which urges the more vulnerable to carry a larger share of the burden. This observation is also in line with Bicchieri (2005), who argues that a social norm is a behavioral rule that many people know exists but do not necessarily follow. Third, at a first glance there is a rather low share of groups following the equal-burden-sharing rule in BASE, namely 62.5% of the successful groups (50% of all groups). This is due to several groups in which three subjects adhered to equal burden sharing and a fourth subject contributed an inefficiently high amount of tokens (> 25 tokens). Including these inefficient groups, the share of equally sharing groups among successful groups increases to 87.5% (70% of all groups) in BASE.

Fig. 2
figure 2

Burden sharing in groups

Overall, when a certain burden-sharing rule was popular ex-ante, groups were also likely to coordinate on this rule. When, however, burden sharing preferences were more heterogeneous, as in VUL, groups also coordinated on a variety of different burden-sharing rules and had in general more difficulties coordinating successfully compared to the other treatments.

5.4 Individual Contributions

In WLT, we saw that a majority of successful groups implemented the ability-to-pay rule. Figure 3 shows that the rich subjects contributed significantly more than 20 tokens, compensating for the lower wealth of the poor who contributed significantly less than this (Wilcoxon signed rank tests, p = 0.000 for both types; the red line in Fig. 3 marks the equal share of 20 token). The difference between the contributions of both types is significant (Wilcoxon rank-sum test, p = 0.000). In VUL burden sharing was more diverse and not many groups strictly implemented the beneficiary-pays rule as it was defined in Sect. 5.3. On the individual level, however, we find that, on average, the more vulnerable contributed significantly more than the less vulnerable (Wilcoxon rank-sum test, p = 0.002). That is, the more vulnerable contributed significantly more and the less vulnerable less than the equal share of 20 tokens (Wilcoxon signed rank tests, p = 0.017 and p = 0.084, respectively).

Fig. 3
figure 3

Average sum of contributions by type, conditional on treatment and group success

The distribution of individual contributions in WLT-VUL closely mirrors the distribution in WLT: the rich/less vulnerable types contributed significantly more than 20 tokens, the poor/more vulnerable types significantly less (Wilcoxon signed rank tests, p = 0.000 for both types), and the difference between the contributions of both types is highly significant (Wilcoxon rank-sum test, p = 0.000). Treatment comparisons reveal that there is no significant difference between the contributions of the rich/less vulnerable in WLT-VUL and the rich in WLT, but the difference is highly significant with respect to the less vulnerable in VUL (Wilcoxon rank-sum tests, p = 0.161 and p = 0.000, respectively). Likewise, the contributions of the poor/more vulnerable in WLT-VUL are not significantly different from those of the poor in WLT, but significantly different from those of the more vulnerable in VUL (Wilcoxon rank-sum tests, p = 0.220 and p = 0.044, respectively). This suggests that the contribution decisions of subjects in WLT-VUL are guided by the distribution of wealth in the group.

The picture looks quite different for failing groups, where all types in all treatments contributed (significantly) less than 20 tokens (Wilcoxon signed rank tests, p = 0.057 for equal types, p = 0.020 for more and less vulnerable types, p = 0.059 for rich types, p = 0.035 for poor types). In particular the rich in WLT and the more vulnerable in VUL reduced their contributions substantially compared to their peers in successful groups. In consequence, in failing groups there are no longer any significant differences in contribution behavior between the rich and the poor in WLT or the more and less vulnerable in VUL (Wilcoxon rank-sum tests, p = 0.872 and p = 0.708, respectively).

To sum up, these findings suggest that if groups are heterogeneous regarding their members’ wealth, the willingness of the rich to carry a larger share of the burden is decisive for group success. Likewise, if groups are heterogeneous with respect to vulnerability, the more vulnerable subjects have to take the lead to ensure group success. In case of multidimensional and correlated heterogeneity the wealth dimension seems to be more salient and drive contribution decisions. Considering the significantly higher success rate of groups in WLT-VUL than in WLT, the role of vulnerability should, however, not be underestimated. In line with needs-based distributive justice norms, it is plausible to assume that if low wealth coincides with high vulnerability, the rich might feel a stronger responsibility to assist those in need and to take care that the collective bad, which would affect the already poor disproportionately, is prevented (Konow 2010; Gampfer 2014). The effect of heterogeneity in vulnerability on burden sharing is thus reversed when it coincides with heterogeneity in wealth.

5.4.1 Self-Serving Behavior

The results presented so far suggest that a self-serving interpretation of burden sharing rules is the reason why many groups fail. And indeed, column 1 in Table 3 confirms that differences in group success cannot be attributed to differences in ex-ante preferences (with respect to contributions of their type) since there are no significant differences in ex-ante preferences between members of successful and failing groups (Wilcoxon rank-sum tests, p = 0.239 for equal types, p = 0.606 for more vulnerable types, p = 0.383 for less vulnerable types, p = 0.755 for poor types, p = 0.256 for rich types).

Table 3 Ex-ante preferences, average contributions and 1st round contributions by treatment and type, conditional on group success

Notably, in successful groups, subjects’ average contributions overall (column 2 in Table 3) and in round 1 (column 3) do not deviate significantly from their ex-ante preferences (column 1) (Wilcoxon signed rank tests, p = 0.069 and p = 0.549, respectively). We find, however, evidence for self-serving behavior in failing groups: Subjects who belonged to groups that ultimately failed contributed significantly less not only on average (column 4), but already in the first round (column 5) than what they had specified ex-ante as fair (Wilcoxon signed rank tests, p = 0.000 and p = 0.021, respectively).

These findings suggest that groups are more likely to succeed if subjects contribute what they consider a fair contribution for their type behind a veil of ignorance and abstain from adjusting their contributions in a self-serving way. See Appendix 4 for an OLS regression backing these results.

5.5 Dynamics

This section takes a look at factors that might determine a change in individual contributions in a given round relative to the previous round: others’ contributions, individual beliefs about others’ contributions as well as beliefs about the group reaching the threshold.

Model (1) in Table 4 shows that subjects adjusted their contributions relative to others’ contributions in the previous round: if the other group members contributed on average more (less) in the previous round, subjects increased (decreased) their contribution in the current round significantly (contr others—contr ego, lag1). They also increased (decreased) their contributions if they corrected their beliefs about others’ contributions upwards (downwards) relative to the previous round (Δ belief group contribution). Both effects are evidence for a conditional willingness to contribute. Surprisingly, there is no evidence that subjects adjusted their contributions if their beliefs regarding the likelihood of reaching the threshold changed (Δ threshold belief). One reason might be that subjects mostly adjusted their beliefs not at all or only slightly from one round to the next. In 65% of all rounds subjects did not adjust their beliefs and in 27% only by one unit (on a scale from 1 to 7). Theoretically, subjects should, however, adjust their beliefs and hence contributions drastically as soon as the threshold is out of reach due to insufficient contributions in previous rounds. Including a dummy variable that takes the value one in the round in which the threshold moves definitely out of reach (target out of reach), I indeed find a strong and significant negative effect on contributions. Subjects seemed to understand that further contributions are in vain and rationally decreased their contributions. 94% of the respective subjects reduced their contributions to the rational level of zero. All results presented in this section are robust to considering only successful groups in model (2) and only failing groups in model (3).

Table 4 Panel model—determinants of change in individual contributions

To sum up, the results provide evidence for subjects’ conditional willingness to contribute: Subjects adjust their contributions to others’ contributions in previous rounds and also to changes in their beliefs regarding others’ contributions. Once the threshold cannot be reached any longer, subjects respond rationally by decreasing their contributions immediately and significantly. A more detailed analysis of the role of beliefs is provided in Appendix 5.

5.6 Efficiency and Earnings

Finally, two alternative performance concepts besides the success rate are discussed: Efficiency and the distribution of earnings among group members.

5.6.1 Efficiency

When groups missed the threshold, their members not only experienced a loss, but their entire contributions to the public account were wasted. As a consequence, successful groups were strikingly more efficient than failing groups. While successful groups wasted on average 2.9 tokens due to excess contributions, failing groups contributed—and thus wasted—on average 45.6 tokens.

Among successful groups, groups in VUL had the largest difficulties in coordinating. They wasted on average 4 tokens, significantly more than groups in WLT-VUL (1.8 tokens) and WLT (2.5 tokens) (Wilcoxon rank-sum tests, p = 0.000 and p = 0.006, respectively). Among failing groups, groups in BASE did a particularly bad job by failing at a very high average contribution level of 65 tokens compared to the relatively lower levels in VUL (39.5 tokens) and WLT (40.3 tokens) (Wilcoxon rank-sum tests, p = 0.003 and p = 0.000, respectively).

5.6.2 Earnings

Coordinating on the threshold in general paid off for subjects, not least because of the high (low) level of efficiency in successful (failing) groups. Members of successful groups earned significantly more than members of failing groups across all treatments (Wilcoxon rank-sum tests, p = 0.000 for all treatments). Figure 4 shows that the less vulnerable in VUL, the rich in WLT and the rich/less vulnerable in WLT-VUL earned on average significantly more than the respective other types, both in successful (Wilcoxon rank-sum tests, p = 0.002 for VUL and WLT, p = 0.000 for WLT-VUL) and failing groups (Wilcoxon rank-sum tests, p = 0.002 for VUL, p = 0.006 for WLT).

Fig. 4
figure 4

Average individual earnings conditional on type, treatment and group success

Remember, however, the differences in the initial wealth situation in VUL, on the one hand, and WLT and WLT-VUL, on the other hand. In VUL, both types were provided with the same total amount of operating funds. This implies that if the less vulnerable earned more, this is because they contributed less to the public account than the more vulnerable. In WLT and WLT-VUL, on the other hand, rich subjects earned more despite contributing a larger share to the public account because they had higher operating funds than the poor in the first place. From this perspective, inequality decreased significantly from 20 to 5.88 tokens in WLT and from 20 to 7.9 tokens in WLT-VUL, while it increased significantly in VUL from zero to 9.3 tokens (Wilcoxon signed rank tests, p = 0.000 for all comparisons). This is also illustrated by deviations from the red line in Fig. 4 marking 60 tokens, which is the amount individual group members would earn if contributions were fully efficient and all group members earned the same.

The findings in WLT and WLT-VUL are in line with those of Tavoni et al. (2011) who found that successful groups are able to eliminate initial wealth heterogeneity. The opposite happened, however, in VUL where initial equality in wealth was eliminated. In this treatment payoff equality seemed to be less of a concern for subjects. One explanation might be that the wealth dimension was less salient and subjects focused on the different benefits the various types would derive from reaching the threshold.

6 Discussion and Conclusion

The results of this experiment provide new insights into the role of vulnerability, wealth and fair burden sharing when heterogeneous groups seek to overcome a collective action problem. While heterogeneity of wealth has been studied in the experimental literature on climate change before, heterogeneity in vulnerability has hardly ever been addressed (see Sect. 2 for references). Importantly, the interaction between both dimensions has not been studied systematically before. This is problematic since the conflict of interest between countries involved in climate change negotiations is triggered to a large extent by the fact that their wealth and vulnerability tend to be correlated. As a consequence, climate negotiators tend to support conflicting burden-sharing rules.

The most surprising result of this study is that groups are very effective in solving the dilemma that emerges when low wealth coincides with high vulnerability, and vice versa. While heterogeneity in vulnerability by itself poses a challenging coordination problem for groups, it seems to have a facilitating effect once it coincides with wealth heterogeneity. Needs-based distributive justice norms provide a plausible explanation for this effect (Konow 2010; Gampfer 2014): The fact, that the already poor would in addition suffer disproportionately from the collective bad makes the responsibility of the rich to take a larger share of the burden—and thus assist those in need—more salient. In the experiment, the rich seem to readily assume this responsibility, resulting in an impressive success rate and a high efficiency of contributions in the respective groups. These findings challenge those of Burton-Chellew et al. (2013) and Gampfer (2014) who find that vulnerability has the opposite effect, i.e., that it significantly decreases the willingness to pay of the rich and less vulnerable. As a consequence, in their experiments these groups were significantly less successful in reaching the threshold. One reason for the diverging results might be the provision of a pledge mechanism as in Tavoni et al. (2011), which facilitated coordination in this experiment.

The results presented in this paper suggest that the focus on a needs-based perspective in the ongoing climate negotiations where countries that are both poor and more vulnerable negotiate with countries that are both rich and less vulnerable might be promising. Emphasizing the double-disadvantaged position of the first might trigger inequality aversion in the second and in turn facilitate coordination for reaching the collective target. However, the real-world situation is of course much more complex than the one modeled in the lab, where subjects were confronted with a relatively simple task. Other aspects than the heterogeneities studied here might explain the difficulties of international climate negotiators to reach a consensus on how to share the burden of emission reductions—not only compared to the groups in this experiment but also compared to other more successful international negotiations characterized by the heterogeneity of its agents, such as those on the Montreal Protocol on Substances that Deplete the Ozone Layer that involved side payments from developed to developing countries (Benedick 1998).

First, the focus of my study was on two spatial dimensions of climate change, wealth and vulnerability, ignoring a historical dimension. Economic wealth is often attributed to large green-house gas emissions in the past, implying a historical responsibility of wealthier nations for the current state of the world’s climate (Fuessel 2010). From a fairness perspective, this bolsters the claim of developing countries for larger contributions from developed countries. The implications of historical responsibility on burden sharing have been studied in lab experiments by Tavoni et al. (2011) and Gampfer (2014), but it would be an interesting avenue for further research to study the interaction of all three dimensions (wealth, vulnerability, historical responsibility) in the given experimental setting. Second, to keep things simple, I abstained from including uncertainty regarding the location of the threshold and the consequences of missing it. Remember that the introduction of a certain threshold turns the game into a coordination game with two Nash equilibria. Previous experimental studies show that while uncertainty about the consequences of missing the threshold does not impede coordination, uncertainty about its location is detrimental for success (Barrett and Dannenberg 2012, 2014b; Brown and Kroll 2017). The attempt of international climate negotiators to establish the 2 °C-target as a threshold to coordinate on falls short of providing the degree of certainty required for successful coordination (because countries only control their emissions directly, while the effect of the emissions on temperature is uncertain). Again, it would be worthwhile to study how introducing uncertainty about the threshold, which would turn the situation from a coordination game into a prisoner’s dilemma, affects success rates in the current experimental setting. While I would expect success rates to drop across treatments, I would expect the main treatment effect to be robust to such a change in parameters. Third, decisions in my experiment are made by individuals, while governments are the decision makers in climate negotiations. What individuals perceive as fair matters, however, since governments—at least in democracies—consider the public opinion prevailing in their country when sitting at the negotiation table. Not least because the outcomes of the negotiations have to be translated into domestic policy measures, for example, on energy supply and use, which directly affect the public (Gampfer 2014). Fourth, one has to consider the—often considerable—power of special interest groups who would lose from a climate agreement (e.g. the oil industry) and lobby against a climate agreement although this might be in the better interest of the country in general (Stiglitz 2015). Fifth, participants in the experiment were students, who are not representatives of the general population, and care has to be taken when generalizing the results (Henrich et al. 2010). Further aspects not considered in the experiment are the potential gradual benefits from investing in climate change prevention, for example, savings due to a more efficient use of energy, or the potential gradual consequences of climate change as opposed to the catastrophic event modeled in this study. For experimental studies considering the first see Barrett and Dannenberg (2012); for experiments on the latter see Barrett and Dannenberg (2014a) or Freytag et al. (2014).

This study was motivated by conflicting interests in international climate negotiations between developing countries, on the one hand, and developed countries, on the other hand. The bold categorization of developing countries as poor and more vulnerable and developed countries as rich and less vulnerable, assuming a perfect correlation of wealth and vulnerability, admittedly disregards the large heterogeneity within these groups of countries. Note, however, that wealth and vulnerability are important criteria for the formation of coalitions of countries who work together in the climate negotiations. An example is the group of Small Island Developing States, a coalition of 40 developing island states that are particularly vulnerable to a rise in sea-levels (UNFCCC 2021). Note, also, that conflicting interests arising from heterogeneity in wealth and vulnerability exist not only on the country level, but also on the individual level. This notion is also supported by the Intergovernmental Panel on Climate Change, which states that risks from climate change are “generally greater for disadvantaged people and communities in countries at all levels of development” (IPCC 2014).