1 Introduction

In negotiations under incomplete information, people commonly need to make strategic decisions about whether and how to reveal information to others. For example, consider a scenario in which a bank is offering to purchase a struggling company in return for potential job cuts. The unions may not allow the company to accept the offer because they refuse to agree to layoffs. However, if the bank discloses that it is committed to keeping the company afloat, the unions may agree to the buy-out despite the layoffs. On the other hand, revealing goals is often associated with a cost or can be exploited by the other party. In our example, if the bank declares it is committed not to liquidate the company, the unions may demand no layoffs.

The focus of this paper is on negotiation settings in which participants lack information about each other’s preferences, often hindering their ability to reach beneficial agreements. Specifically, we study a particular class of such settings we call “revelation games”, in which players are given the choice to truthfully reveal private information before commencing in a finite sequence of alternating negotiation rounds. Revealing this information narrows the search space of possible agreements and may lead to agreement more quickly, but may also lead players to be exploited by others.

Revelation games combine two types of interactions that have been studied in the past in the economics literature: Signalling games [25], in which players choose whether to convey private information to each other, and bargaining [19], in which players engage in multiple negotiation rounds. They are analogous to real-world scenarios in which parties may choose to truthfully reveal information before negotiation ensues, such as the example presented above.

Constructing effective agent strategies for such settings is challenging. On the one hand, behavioral economics work has shown that people often follow equilibrium strategies [4] when deciding whether to reveal private information to others. On the other hand, people’s bargaining behavior does not adhere to equilibrium [8, 18], and computers cannot use such strategies to negotiate well with people [17].

The main contribution of the paper is an agent-design that incorporates information revelation decisions into its negotiation strategy using decision theory and machine learning. The agent tries to determine which proposals people are likely to accept based on past data, or whether they are more or less likely to accept an offer when information is revealed. It combines a prediction model of people’s behavior in the game with a decision-theoretic approach to make optimal decisions. The model includes the social factors that affect people’s decisions whether to reveal private information, (e.g., the generosity, competitiveness and the selfishness of the offers they make, as measured in the scoring function of the game). In addition, the model included the effects of people’s revelation decisions on their negotiation behavior. It evaluated the agent-design with agents playing equilibrium strategies, as well as other people, in two types of revelation games that varied how players depend on each other in the game.

The results showed that the agent was able to outperform human players playing other people, as well as the equilibrium agent. It learned to make offers that were significantly more beneficial to people than the offers made by other people, while not compromising its own benefit, and was able to reach agreement significantly more often than did people as well as the equilibrium agent. In particular, it was able to exploit people’s tendency to agree to offers that are beneficial to the agent if people revealed information at the onset of the negotiation. It also positively affected people’s play, in that their overall performance was significantly better when paying the agent than when playing other people. The paper thus has significance for agent designers, showing that (1) people do not adhere to equilibrium strategies when revealing information in negotiation; (2) reasoning about the social factors that affect people’s decisions can significantly improve agents’ performance as compared to using equilibrium strategies.

This paper extends an initial study reporting on revelation games [22] in several ways. First, it provides a formal equilibrium analysis of revelation games. Second, it analyzes people’s behavior in these games, including the extent to which they respond to proposals made by the agent, and how their performance is affected by playing with the agent. Lastly, the empirical analysis includes additional games to the ones originally reported using subjects from different countries. This provides further empirical support to the generalizability of the agent to different demographic groups.

2 Related work

Our work is related to a growing line of work in multi-agent systems that use opponent modeling to build agents for repeated negotiation in heterogeneous human-computer settings. These include the KBAgent that made offers with multiple attributes in settings which supported opting out options and partial agreements [20]. This agent used a social utility function to consider the trade-offs between its own benefit from an offer and the probability that it is accepted by people. It used density estimation to model people’s behavior following a method suggested by Coehoorn and Jennings for modeling computational agents [6] and approximated people’s reasoning by assuming that people would accept offers from computers that are similar to offers they make to each other. Other works employed Bayesian techniques [7, 15] or approximation heuristics [16] to estimate people’s preferences in negotiation and integrated this model with a pre-defined negotiation or concession strategy to make offers. Bench-Capon [5] provide an argumentation based mechanism for explaining human behavior in the ultimatum game. Work by Rosenfeld and Kraus [23] used Aspiration Adaptation Theory was more useful that other bounded and strictly rational models in quantifying peoples’ negotiation preferences. There is also prior work dealing with selective information disclosure in human-agent settings [2, 14, 24].

None of these works allowed for agents to reveal private information during negotiation. In addition, we are the first to develop a strategic model of people’s negotiation behavior that reasons about information revelation, to formalize an optimal decision-making paradigm for agents using this model.

Gal and Pfeffer [11] proposed a model of human reciprocity in a setting consisting of multiple one-shot take-it-or-leave-it games, but did not evaluate a computer agent or show how the model can be used to make decisions in the game. Other works [9, 10, 13] combined a decision-theoretic approach with a set of rules to adapt to people’s negotiation in settings with non-binding agreements. Our work augments these studies in allowing players to reveal private information and in explicitly modeling the effect of revelation on people’s negotiation behavior.

Our work is also related to computational models of argumentation, in that people’s revelation decisions provide an explanation of the type of offers they make during negotiation. Work in interest-based negotiation has studied different protocols that allows players to reveal their goals in negotiation in a controlled fashion [21, 26]. These works assume that agents follow pre-defined strategies for revealing information and do not consider or model human participants.

Lastly, revelation games, which incorporate both signaling and bargaining, were inspired by canonical studies showing that people learn to play equilibrium strategies when they need to signal their private information to others [4]. On the other hand, people’s bargaining behavior does not adhere to equilibrium [8, 18], and computers cannot use such strategies to negotiate well with people [17]. Our work shows that integrating opponent modeling and density estimation techniques is an effective approach for creating agents that can outperform people as well equilibrium strategies in revelation games.

3 Implementation: colored trails

We based our empirical work on a test-bed called Colored Trails [12], which we adapted to model revelation games with 2 rounds, the minimal number that allows an offer to be made by both players. Our revelation game is played on a board of colored squares. Each player has a square on the board that is designated as its goal. The object of the game is to reach the goal square. To move to an adjacent square required surrendering a chip in the color of that square. Players had full view of the board and each others’ chips. Both players were shown two possible locations for their goals with associated belief probabilities, but each player could only see its own goal.

Fig. 1
figure 1

Two CT revelation games (shown from Bob’s point of view). a Symmetric Board Game, b asymmetric Board Game, c a possible proposal

We define the no-negotiation alternative score for a player as the score obtained when agreement is not reached. The study included two CT board games. The first board game, called symmetric, is shown in Fig. 1a. The initial chip allocation for each player is specified in the Player Chip Display panel. Here, the “me” and “O” icons represent two players, Bob and Alice, respectively. Each player has two possible goals. For example, Bob has two possible goals. Bob’s true goal is located four steps to the left of the “me” icon (appearing as a white G square), while Bob’s other goal is located four steps below the “me” icon (appearing as a grey square outlined with a “?” symbol). In turn, Alice’s possible goals are presented as two grey circles outlined with “?” symbols. The board is presented from Bob’s point of view. Bob can see its true goal location but Alice does not observe it. Unless Bob chooses to reveal its goal, Alice does not know whether Bob needs a purple or light-green chip to reach the goal. Similarly, Bob cannot observe Alice’s true goal location. The number “50” on each goal square represent a 50 % probability that the true goal lies in that square. In the symmetric board, the length of the path between the Alice’s location and each of her possible goal squares was equal. Consequently, for the symmetric board, the no-negotiation alternative score of each player does not depend on its true goal location. The analysis from Bob’s point of view is similar. In contrast, in the second board, called asymmetric, shown in Fig. 1b, one of the players’ goal locations is closer to its starting position than the other.

Our CT game progresses in three phases with associated time limits. In the revelation phase (round 0), both players can choose to truthfully reveal their goal to the other player.Footnote 1 In the proposal phase (round 1), one of the players is randomly assigned the role of proposer and can offer to exchange a (possibly empty) subset of its chips with a (possibly empty) subset of the chips of the other player. If the responder accepts the offer, the chips are transferred automatically according to the agreement, both participants will automatically be moved as close as possible to the goal square given their chips and the game will end. If the responder rejects (or no offer was received), it will be able to make a counter-proposal (round 2). If the proposal is accepted, the game will end with the agreement result as above. Otherwise, the game will end with no agreement.

At the end of the game, the score for each player is computed as follows: 100 points bonus for reaching the goal; 5 points for each chip left in a player’s possession, and 10 points deducted for any square in the path between the players’ final position and the goal-square.Footnote 2

The CT game configuration described above is analogous to task settings involving information revelation and incomplete information. We illustrate this analogy using the union agreement scenario presented in the Sect. 1 Goals on the board represent private information that is available to the negotiation parties, such as whether the bank is intending to liquidate the company. Different squares on the board represent different types of sub-tasks, such as agreeing on the number of layoffs, hiring new personnel, and deciding on compensation. Chips correspond to agent capabilities and skills required to fulfill sub-tasks. Traversing a path through the board to the goal square represents reaching agreement that is composed of the various sub-tasks on the paths. The game is interesting because players need to reason about the tradeoff between revealing their goals and providing information to the other player, or not to reveal their goals to possibly receive or ask for more chips than they need. In addition, there is an advantage to the proposer player in the second round, in that it makes the final offer in the game. But the identity of the second proposer is not known at the time that players decide whether to reveal their goals.

4 Equilibrium analysis

In this section we formalize the colored trails game described earlier as a repeated game of imperfect information. Each player has a type \(t_i\) that represents the true position of its goal on the board.Footnote 3 Let \(\omega ^{n}\) represent an offer made by a proposer player at round \(n\in \{1,2\}\) in a game. Let \(r^{n}\in \{\mathsf {accept,reject}\}\) represent the response to \(\omega ^{n}\) by a responder player. Let \(s_i\) represent the score in the game as described in the previous section. The no-negotiation alternative (NNA) score to player \(i\) of type \(t_{i}\) is the score for \(i\) in the game given that no agreement was reached. We denote the score for this event as \(s_{i}(\emptyset \mid t_i)\). If no agreement was reached in round 1 of the game, players’ final score depends on whether the counter-proposal in round 2 is accepted. If no agreement was reached in round 2 (the last round) of the game, players’ NNA score is also their final score in the game. We denote the benefit to player \(i\) from \(\omega ^{n}\) given that \(r^{n}=accept\) as \(\pi _{i}(\omega ^{n}\mid t_{i})\). This is defined as the difference in score to \(i\) between an offer \(\omega ^{n}\) and the NNA score:

$$\begin{aligned} \pi _{i}(\omega ^{n}\mid t_{i})=s_{i}(\omega ^{n}\mid t_{i})-s_{i}(\emptyset \mid t_i) \end{aligned}$$
(1)

Let \(T_{i}\) denote a set of types for a general player \(i\). Let \(\mathsf {rev_i}\) denote the event in which \(i\) reveals its type \(t_{i}\in T_{i}\), and let \(\mathsf {noRev_i}\) denote the event in which \(i\) does not reveal its type. Let \(h^{n}\) denote a history of moves, including for both players \(i\) and \(j\) their revelation decision at the onset of the game, and the proposals and responses for rounds 1 through \(n\). For the remainder of this section we will assume (without loss of generality) that player \(j\) is the proposer in round 1, and player \(i\) as the proposer in round 2. We denote by \(p(t_j\mid h^n)\) the probabilistic belief of player’s \(i\) on player \(j\) being of type \(t_j\) (and similarily for player \(j\)).

Let \(a_{j}^{2}(\omega ,t_j)\) define a strategy for responder \(j\) at round \(2\) of type \(t_j\) that accepts any proposal \(\omega \) with positive benefit.

$$\begin{aligned} a_{j}^{2}(\omega ,t_j)={\left\{ \begin{array}{ll} \mathsf {accept} &{}\quad if \, \pi _j \left( \omega \mid t_{j}\right) \ge 0\\ \mathsf {reject} &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(2)

We will abuse notation and write \(a_j^2\) when the proposal \(\omega \) is clear from context. We extend Eq. 1 to define the benefit to proposer \(i\) given the response strategy \(a_{j}^{2}(\omega ,t_j)\).

$$\begin{aligned} \pi _i(\omega , a^2_j, t_j\mid t_i)={\left\{ \begin{array}{ll} \pi _i(\omega \mid t_i) &{}\quad if \, a_j^2(\omega ,t_j)=\mathsf {accept}\\ 0&{}\quad \mathsf {otherwise} \end{array}\right. } \end{aligned}$$
(3)

We denote a proposal from \(j\) to \(i\) in round 1 as \(\omega _{j,i}^{1}\) and a proposal from \(i\) to \(j\) in round 2 as \(\omega _{i,j}^{2}\) (also called \(i\)’s counter-proposal). We define several possible strategies for making revelation decisions and proposals in the game, and proceed to show that these strategies are in equilibrium under certain sufficient conditions.

4.1 Revelation by both players

We formalize a revelation equilibrium using backward induction and then provide the conditions under which the equilibrium holds. We assume two possible types for player \(j\), denoted \(t_{j_1}\) and \(t_{j_2}\), and two possible types for player \(i\), denoted \(t_{i_1}\) and \(t_{i_2}\). We begin with the following definitions. Let \(\omega _{i,{j}}^{2^{*}}(t_j)\) denote a proposal in round 2 that maximizes the benefit of player \(i\) given that player \(j\) is of type \(t_j\), the history \(h^1=\{ \mathsf {rev_{i}, rev_j}, \omega _{j,i}^1, \mathsf {reject} \}\), and that player \(j\) uses strategy \(a^2_j\) of Eq. 2 to accept proposals.

$$\begin{aligned} \omega _{i,{j}}^{2^{*}}(t_j) \in \underset{\omega _{i,{j}}^{2}\in \varOmega _{i,j}}{\mathrm{arg }\,\mathrm{max }}\; \pi _{i}\left( \omega _{i,{j}}^{2}, a_j^2,t_j \mid t_i\right) \end{aligned}$$
(4)

where (without loss of generality) \(\varOmega _{i,j}\) is all the set of all the proposals player \(i\) can propose to player \(j\).

Let \(\omega _{j,i}^{1^*}(t_i)\) be the proposal that will provide player \(j\) with the same benefit as the proposal made by player \(i\) in round 2.

$$\begin{aligned} \omega _{j,i}^{1^*}(t_i) \in \underset{\omega _{j,i}\in \varOmega _{j,i}}{\mathrm{arg }\,\mathrm{max }}\; \left( \pi _j (\omega _{j,i} \mid t_j) s.t. \pi _i(\omega _{j,i} \mid t_i) = \pi _i ( \omega _{i,{j}}^{2^{*}} \mid t_i ) \right) \end{aligned}$$
(5)

where \(\omega _{i,j}^{2^*}\) satisfies Eq. 4.

We say that \(t_{j_1}\) is preferable for \(j\) if its benefit from the proposal that \(j\) receives in round 2 is greater when it is computed for type \(t_{j_1}\) than type \(t_{j_2}\), given that \(j\) is of type \(t_{j_1}\).

$$\begin{aligned} \pi _j(\omega _{i,j}^{2*}(t_{j_1}) \mid t_{j_1}) \ge \pi _j(\omega _{i,j}^{2*}(t_{j_2}) \mid t_{j_1}) \end{aligned}$$
(6)

Similarly, we can define when \(t_{j_2}\) is preferable for \(j\). For example, consider the asymmetric game of Fig. 1b. For each player, there are two types in this game, a “weak” type, that is missing two chips to get to the goal, and a “strong” type, that is missing one chip to get to the goal. The weak type will prefer the proposal that assumes it is the weak type (which will provide it with two chips) than the proposal that assumes it is the strong type (which will provide it with one chip). Thus, for each player, the weak type is preferable to the strong type.

We propose the following perfect Bayesian equilibrium.

Round 0::

Both players reveal their types for each of their types.

Round 1::

We distinguish between two cases. In each case we specify how \(i\) updates its beli In this case, player \(i\) updates its beliefs to assign probability 1 to \(j\)’s type. In the second case, player \(j\) deviates from equilibrium and does not reveal its type. In this case, player \(i\) will update its beliefs over \(j\) as follows. If type \(t_{j_1}\) is preferable to \(j\), then player \(i\) will update its beliefs that player \(j\) is of type \(t_{j_2}\).

$$\begin{aligned} P(t_{j_1} \mid h_1) = 0 \end{aligned}$$
(7)

(And conversely for the case in which type \(t_{j_2}\) is preferable to player \(j\)). If \(i\) revealed its type in round 0 then player \(j\) assign probability 1 to \(i\)’s type. Otherwise, it will arbitrarily assign probability 1 to \(t_{i_1}\). We now describe the actions for players \(i\) and \(j\) in round 1. Player \(j\) makes a proposal \(\omega _{j,i}^{1^*}(t_i)\) that satisfies Eq. 5. In turn, player \(i\) will use the following acceptance strategy \(a_i^1(\omega ,t_i,t_j)\) such that

$$\begin{aligned} a_{i}^{1}(\omega ,t_i,t_j)={\left\{ \begin{array}{ll} \mathsf {accept} &{}\quad if \, \pi _i( \omega \mid t_i) \ge \pi _i(\omega ^{2^*}_{i,j}(t_j) \mid t_i) \\ \mathsf {reject} &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(8)
Round 2::

Both players do not update their beliefs over each other’s actions in round 1. Player \(i\) makes a proposal \(\omega _{i,{j}}^{2^{*}}\) that satisfies Eq. 4. In turn, player \(j\) accepts any proposal according to the strategy \(a^2_j(\omega ,t_j)\) specified in Eq. 2.

4.2 Revelation by neither player

In this case neither of the players disclose their true types at the onset of the game. We make the following definitions. Let \(E_{h^1}\big [ \pi _{i}\big (\omega _{i,{j}}^{2}\mid t_{i}\big )\big ]\) denote the expected benefit of \(i\) for proposal \(\omega _{i,{j}}^{2}\) given its updated beliefs over \(j\)’s types and that \(j\) accepts beneficial proposals.

$$\begin{aligned} E_{h^1}\left[ \pi _{i}\left( \omega _{i,{j}}^{2}\mid t_{i}\right) \right] =\sum _{t_{j}\in T_{j}}p\left( t_{j}\mid h^{1}\right) \pi _{i}\left( \omega _{i,{j}}^{2},a_{j}^{2},t_j \mid t_{i}\right) \end{aligned}$$
(9)

Let \(\omega _{i,{j}}^{2^{*}}(h^1,t_i)\) be the proposal in round 2 of player \(i\) of type \(t_i\) that maximizes Eq. 9 given the history \(h^1\):

$$\begin{aligned} \omega _{i,{j}}^{2^{*}}(h^1,t_i)\in \underset{\omega _{i,{j}}^{2}\in \varOmega _{i,j}}{\mathrm{arg }\,\mathrm{max }}\;E_{h^1}\left[ \pi _{i}\left( \omega _{i,{j}}^{2}\mid t_{i}\right) \right] \end{aligned}$$
(10)

Let \(\omega _{j,i}^{1^*}\) be the proposal of player \(j\) in round 1 that provides it with the same benefit from the counter-proposal made by one of the types of player \(i\) in round 2. This proposal does not depend on the type of \(j\).

$$\begin{aligned} \omega _{j,i}^{1^*} \in \{ \omega _{i,{j}}^{2^{*}}(h^1,t_{i_1}), \omega _{i,{j}}^{2^{*}}(h^1,t_{i_2}) \} \end{aligned}$$
(11)

where \(\omega _{i,{j}}^{2^{*}}(h^1,t_i)\) satisfies Eq. 10.

We define a perfect Bayesian equilibrium as follows:

Round 0::

Neither player reveals its type for each of its type.

Round 1::

We distinguish between two cases. In each case we specify how \(i\) updates its beliefs about \(j\)’s type and which proposal \(j\) makes in round 1. In the first case, player \(j\) did not reveal its type (i.e., followed the equilibrium strategy). In this case, player \(i\) does not update its beliefs over \(j\)’s types. In the second case, player \(j\) revealed its type. In this case, player \(i\) updates its beliefs to assign probability 1 to \(j\)’s type. (And similarly for for player \(i\)). We now describe players’ actions in round 1. Player \(j\) will make a proposal \(\omega _{j,i}^{1^*}\) that satisfies Eq. 11. In turn, player \(i\) will use strategy \(a_i^1(\omega ,t_i)\) such that

$$\begin{aligned} a_{i}^{1}(\omega ,t_i)={\left\{ \begin{array}{ll} \mathsf {accept} &{}\quad if \, \pi _i( \omega \mid t_i) \ge \pi _i(\omega ^{2^*}_{i,j}(h^1,t_i) \mid t_i) \\ \mathsf {reject} &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(12)

where proposal \(\omega _{i,j}^{2^*}(h^1,t_i)\) satisfies Eq. 10.

Round 2::

Player \(i\) does not update its beliefs over \(j\)’s types (and similarly for player \(j\)). Player \(i\) will make a proposal \(\omega _{i,{j}}^{2^{*}}(h^1,t_i)\) that satisfies Eq. 10. player \(j\) accepts any beneficial proposal following Eq. 2.

4.3 Sufficient condition for equilibria

We now describe sufficient conditions for which the revelation and non-revelation equilibria hold. For simplicity, we do make the assumption that player \(j\) has two types \(t_{j_1}\) and \(t_{j_2}\). Suppose without loss of generality that \(j\) is of type \(t_{j_1}\). Let \(\omega _{i,j_1}^{2*}\) be the proposal made by player \(i\) to player \(j\) of type \(t_{j_1}\) that satisfies Eq. 4 (when player \(j\) reveals its type) and let \(\omega _{i,j}^{2*}(h^1)\) be the proposal that satisfies Eq. 10 (when player \(j\) does not reveal its type) and that \(h^1\) is the history up until round 2. We will postulate two propositions:

Proposition 1

If one of \(j\)’s type is preferable (Eq. 6) for \(j\), then the strategies specified in Sect. 4.1 constitute a perfect Bayesian equilibrium in which both players \(i\) and \(j\) reveal their types for each type in round 0.

Proposition 2

If the below conditions hold, then the strategies specified in Sect. 4.2 constitute a perfect Bayesian equilibrium in which both players do not reveal their types for each of their types in round 0.

The first condition says \(j\) prefers not to disclose its type. Formally, for every type \(t_j\) and \(t_{i}\), we say that

$$\begin{aligned} \pi _j(\omega _{i,j}^{2*}(t_{j})\mid t_j) \le \pi _j\left( \omega _{i,j}^{2*}(h^1,t_i) \mid t_{j}\right) \end{aligned}$$
(13)

The second condition says that any proposal that is greater or equal to \(i\) in round 1 is (at least) worse off for \(j\) for each of its types. Formally, we say that for every type \(t_{j}\) and \(t_{i}\), the following holds.

$$\begin{aligned} \forall \omega _{j,i}^1 if \pi _i(\omega _{j,i}^1 \mid t_i) \ge \pi _i\left( \omega _{i,j}^{2*}(h^1,t_i)\mid t_{j}\right) then \pi _j(\omega _{j,i}^1 \mid t_{j}) \le \pi _j\left( \omega _{i,j}^{2*}(h^1,t_i)\mid t_{j}\right) \nonumber \\ \end{aligned}$$
(14)

In the Appendix, we provide a proof to the Proposition above, and also show that the sufficient conditions hold in the board games in our study.

4.4 Adaptation of equilibria to CT game

We will exemplify the equilibrium strategies in the revelation games for the two boards in Fig. 1a, b and demonstrate that they are in equilibrium. In this analysis, each of the two types of each player is assigned a 50 % prior probability. For expository convenience we assume that the “me” player is the first proposer in the game, as shown in the figure.Footnote 4

We begin with the symmetric game in Fig. 1a. Here, both players are located at equal distance of each of their goals. The no-negotiation alternative for both players is 80 points. The “me” player has 24 chips at the onset of the game, and is missing one purple chip to reach the goal square from its initial location. In this game we describe an equilibrium in which neither player reveals its goal.

Round 0::

Neither of the players reveals its goal.

Round 1::

The “me” player will propose to give 21 chips (7 olive-green, 7 orange chips and 7 grey chips) in return for two chips (one purple and one green chip). This proposal will yield a score of 105 for the “me” player and a score of 295 for the “O” player. The “O” player accepts any proposal that provides it with higher benefit than the proposal it will make in round 2.

Round 2::

The “O” player makes the same offer as in round 1. The “me” player accepts any proposal that provides it with positive benefit.

In contrast to the symmetric game, in the asymmetric game the distance of each player from the goal depends on its true goal location, thus the no-negotiation alternative score depends on the type of the player. We therefore distinguish between two possible types of players, weak and strong. The weak player is missing two chips to get to the goal, and its no-negotiation alternative score is 70 points. The strong player is missing a single chip to get to the goal, and its no-negotiation alternative score is 90 points. In Fig. 1b, which shows players’ starting positions on the game board, the “me” player’s type is strong, because its starting position is located closer to its true goal location (the highlighted square marked “G”) than to its other possible goal location. For the asymmetric game of Fig. 1b we present an equilibrium strategy that depend on the type of the “me” player.

Round 0::

Both of the player disclose their goals.

Round 1::

The proposal made by the “me” player depends on its known type. A weak player will propose 10 green chips and 11 gray chips to player “O” in return for 2 purple chips. A strong player will propose 10 green chips and 12 gray chips to player “O” in return for 1 purple chip. The “O” player accepts any proposal that provides it with higher benefit than the proposal it will make in round 2.

Round 2::

The “O” player proposes the offer in round 1 for the the “me” player, which depends on the “me” player’s type. The “me” player will accept any proposal with positive benefit.

5 The SIGAL agent

In this section we describe the Sigmoid Acceptance Learning Agent (SIGAL). SIGAL uses a decision-theoretic approach to negotiate in revelation games, that is based on a model of how humans make decisions in the game. Before describing the strategy used by SIGAL we make the following definitions.

For the remainder of this section, we assume that the SIGAL agent (denoted \(a\)) is playing a person (denoted \(p\)). Let \(\omega _{a,p}^{n}\) represent an offer made by the agent to the person in round \(n\) and let \(r_{p}^{n}\) represent the response of the person to \(\omega _{a,p}^{n}\). The expected benefit to SIGAL from \(\omega _{a,p}^{n}\) given history \(h^{n-1}\) and SIGAL’s type \(t_p\) is denoted \(E_{a}\big (\omega _{a,p}^{n}\mid h^{n-1},t_{a}\big )\). Let \(p(r_{p}^{n}= \mathsf {accept}\mid \omega _{a,p}^{n},h^{n-1})\) denote the probability that \(\omega _{a,p}^{n}\) is accepted by the person given history \(h^{n-1}\).

We now specify the strategy of SIGAL for the revelation game defined in Sect. 3. The strategy assumes there exists a model of how humans make and accept offers in both rounds. We describe how to estimate the parameters of this model in Sect. 5. We begin by describing the negotiation strategies of SIGAL for rounds 2 and 1.

Round 2: If SIGAL is the second proposer, its expected benefit from an offer \((\omega _{a,p}^{2})\) depends on its model of how people accept offers in round 2, encapsulated in the probability \(p(r_{p}^{2}= \mathsf {accept}\mid \omega _{a,p}^{2},h^{1})\). The benefit to SIGAL is

$$\begin{aligned}&E_{a} \left( \omega _{a,p}^{2}\mid h^{1},t_{a}\right) \nonumber \\&\quad =\pi _{a}(\omega _{a,p}^{2}\mid t_{a})\cdot p(r_{p}^{2}= \mathsf {accept}\mid \omega _{a,p}^{2},h^{1}) \nonumber \\&\qquad +\,\pi _{a}(\emptyset \mid t_{a})\cdot p(r_{p}^{2}=\mathsf {reject}\mid \omega _{a,p}^{2},h^{1}) \end{aligned}$$

Here, the term \(\pi _{a}(\emptyset \mid t_{a})\) represents the benefit to SIGAL from the NNA score, which is zero. SIGAL will propose an offer that maximizes its expected benefit in round 2 out of all possible proposals for this round.

$$\begin{aligned} \omega _{a,p}^{2*}=\underset{\omega _{a,p}^{2}}{\mathrm{arg }\,\mathrm{max }}\;E_{a}\left( \omega _{a,p}^{2}\mid h^{1},t_{a}\right) \end{aligned}$$
(15)

If SIGAL is the second responder, its optimal action is to accept any proposal from the person that gives it positive benefit as described in the equilibrium strategy (Eq. 2). Let \(r_{a}^{2*}(\omega _{p,a}^{2}\mid h^{1})\) denote the response of SIGAL to offer \(\omega _{p,a}^{2}\), defined as

$$\begin{aligned} r_{a}^{2*}(\omega _{p,a}^{2}\mid h^{1})={\left\{ \begin{array}{ll} \mathsf {accept} &{}\quad \pi _{a}(\omega _{p,a}^{2}\mid t_{a})>0\\ \mathsf {reject} &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(16)

where \(\pi _{a}(\omega _{p,a}^{2}\mid t_{a})\) is defined in Eq. 1. The benefit to SIGAL from this response is defined as

$$\begin{aligned}&\pi _{a} \left( r_{a}^{2*}\mid \omega _{p,a}^{2},h^{1},t_{a}\right) \nonumber \\&\quad ={\left\{ \begin{array}{ll} \pi _{a}(\omega _{p,a}^{2}\mid t_{a}) &{}\quad r_{a}^{2*}(\omega _{p,a}^{2}\mid h^{1})= \mathsf {accept}\\ \pi _{a}(\emptyset \mid t_a) &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(17)

Round 1: If SIGAL is the first proposer, its expected benefit from making a proposal \(\omega _{a,p}^{1}\) depends on its model of the person: If the person accepts \(\omega _{a,p}^{1}\), then the benefit to SIGAL is just \(\pi _{a}(\omega _{a,p}^{1}\mid t_{a})\). If \((\omega _{a,p}^{1})\) is rejected by the person, then the benefit to SIGAL depends on the counter-proposal \(\omega _{p,a}^{2}\) made by the person in round 2, which itself depends on SIGAL’s model \(p(\omega _{p,a}^{2}\mid h^{1})\) of how people make counter-proposals. The expected benefit to SIGAL from behaving optimally as a second responder for a given offer \(\omega _{p,a}^{2}\) is denoted \(E_{a}(resp^2\mid h^{1},t_{a})\), and defined as

$$\begin{aligned}&E_{a} (resp^2\mid h^{1},t_{a})\nonumber \\&\quad =\sum _{\omega _{p,a}^{2}} p(\omega _{p,a}^{2}\mid h^{1})\cdot \pi _{a}(r_{a}^{2*}\mid \omega _{p,a}^{2},h^{1},t_{a}) \end{aligned}$$
(18)

where \(\pi _{a}(r_{a}^{2*}\mid \omega _{p,a}^{2},h^{1},t_{a})\) is defined in Eq. 17.

Its expected benefit from \(\omega _{a,p}^{1}\) is:

$$\begin{aligned}&E_{a} \left( \omega _{a,p}^{1}\mid h^{0},t_{a}\right) \nonumber \\&\quad = \pi _{a}(\omega _{a,p}^{1}\mid t_{a})\cdot p(r_{p}^{1}= \mathsf {accept}\mid \omega _{a,p}^{1},h^{0})\nonumber \\&\qquad +\, E_{a}(resp^2\mid h^{1},t_{a})\cdot p(r_{p}^{1}=\mathsf {reject}\mid \omega _{a,p}^{1},h^{0}) \end{aligned}$$
(19)

where \(h^1=\big \{h^0,\omega _{a,p}^1,r_p^1=\mathsf {reject}\big \}\). SIGAL will propose an offer in round 1 that maximizes its expected benefit in this round:

$$\begin{aligned} \omega _{a,p}^{1*}&=\underset{\omega _{a,p}^{1}}{\mathrm{arg }\,\mathrm{max }}\; E_{a}\left( \omega _{a,p}^{1}\mid h^{0},t_{a}\right) \end{aligned}$$
(20)

If SIGAL is the first responder, it accepts any offer that provides it with a larger benefit than it would get from making the counter-proposal \(\omega _{a,p}^{2*}\) in round 2, given its model of how people respond to offers in round 2:

$$\begin{aligned} r_{a}^{1*}(\omega _{p,a}^{1}\mid h^{0})={\left\{ \begin{array}{ll} \mathsf {accept} &{}\quad \pi _{a}(\omega _{p,a}^{1}\mid t_{a})>\\ &{} \, E_{a}\left( \omega _{a,p}^{2*}\mid h^{1},t_{a}\right) \\ \mathsf {reject} &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(21)

Here, \(h^{1}=\big \{h^{0},\omega _{p,a}^{1},r_a^1=\mathsf {reject}\big \}\), \(\pi _{a}(\omega _{p,a}^{1}\mid t_{a})\) is defined in Eq. 1 and \(E_{a}\big (\omega _{a,p}^{2*}\mid h^{1},t_{a}\big )\) is the benefit to SIGAL from making an optimal proposal \(\omega _{a,p}^{2*}\) at round 2, as defined in Eq. 15.

Let \(\pi _{a} \big (r_{a}^{1*}\mid \omega _{p,a}^{1},h^{0},t_{a}\big )\) denote the benefit to SIGAL from its response to offer \(\omega _{p,a}^{1}\) in round 1. If SIGAL accepts this offer, it receives the benefit associated with \(\omega _{p,a}^{1}\). If it rejects this offer, it will receive the expected benefit \(E_{a}\big (\omega _{a,p}^{2*}\mid h^{1},t_{a}\big )\) from making an optimal counter-proposal at round 2:

$$\begin{aligned}&\pi _{a} \left( r_{a}^{1*}\mid \omega _{p,a}^{1},h^{0},t_{a}\right) \nonumber \\&\quad ={\left\{ \begin{array}{ll} \pi _{a}(\omega _{p,a}^{1}\mid t_{a}) &{}\quad r_{a}^{1*}(\omega _{p,a}^{1}\mid h^{0})=\mathsf {accept}\\ E_{a}\left( \omega _{a,p}^{2*}\mid h^{1},t_{a}\right) &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(22)

The expected benefit to SIGAL as a responder in round 1 is denoted as \(E_{a} \left( resp^1\mid h^{0},t_{a}\right) \). This benefit depends on its model of all possible offers made by people for each type, given that SIGAL responds optimally to the offer.

$$\begin{aligned}&E_{a} \left( resp^1\mid h^{0},t_{a}\right) =\sum _{t_{p}\in T_{p}}p(t_{p}\mid h^0)\cdot \nonumber \\&\quad \left( \sum _{\omega _{p,a}^{1}} p(\omega _{p,a}^{1}\mid t_{p},h^{0})\cdot \pi _{a}(r_{a}^{1*}\mid \omega _{p,a}^{1},h^{0},t_{a})\right) \end{aligned}$$
(23)

Note that when the person reveals his/her type at round 0, this is encapsulated in the history \(h^{0}\), and \(p(t_{p}\mid h^0)\) equals \(1\) for the person’s true type. Otherwise \(p(t_{p}\mid h^0)\) equals the probability \(p(t_p)\).

Round 0: In the revelation round SIGAL needs to decide whether to reveal its type. Let \(E_a (h^0, t_a)\) denote the expected benefit to SIGAL given that \(h^0\) includes a revelation decision for both players and that \(t_a\) is the type of agent. This benefit depends on the probability that SIGAL is chosen to be a proposer \((p(prop))\) or responder \((p(resp))\) in round 1:

$$\begin{aligned} E_a (h^0,t_a)&= p(resp)\cdot E_{a}\left( resp^1\mid h^{0},t_{a}\right) \nonumber \\&+p(prop)\cdot E_{a}(\omega _{a,p}^{1*}\mid h^{0},t_{a}) \end{aligned}$$
(24)

Here, \(\omega _{a,p}^{1*}\) is the optimal proposal for SIGAL in round 1, and \(E_{a}\left( \omega _{a,p}^{1*}\mid h^{0},t_{a}\right) \) is the expected benefit associated with this proposal, defined in Eq. 19.

Because players do not observe each other’s revelation decisions, the expected benefit for a revelation decision \(\phi _a\) of the SIGAL agent sums over the case where people revealed their type (i.e., \(\phi _a=t_p\)) or did not reveal their type (i.e., \(\phi _a=null)\). We denote \(p(\phi _{p}=t_{p})\) as the probability that the person revealed its type \(t_p\), and \(p(\phi _{p}=null)\) as the probability that the person did not reveal its type \(t_p\).

$$\begin{aligned} E_{a} \left( \phi _{a}\right)&= \sum _{t_{p}\in T_{p}}\left[ p(\phi _{p}=t_{p}) \cdot \right. E_{a}\left( h^{0}=\left\{ \phi _{a},\phi _{p}=t_{p}\right\} ,t_{a}\right) \nonumber \\&+ p(\phi _{p}=null)\cdot E_{a}\left( h^{0}=\left\{ \phi _{a},\phi _{p}=null\right\} , t_{a}\right) \left. \right] \end{aligned}$$
(25)

Given that SIGAL is of type \(t_{a}\in T_{a}\), it reveals its type only if its expected benefit from revelation is greater or equal to not revealing its type:

$$\begin{aligned} \phi _{a}^*={\left\{ \begin{array}{ll} t_{a} &{}\quad E_{a}\left( \phi _{a}=t_{a}\right) \ge \\ &{} \, E_{a}\left( \phi _{a}=null\right) \\ null &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(26)

The value of the game for SIGAL for making the optimal decision whether to reveal its type is defined as \(E_{a} \left( \phi ^*_{a}\right) \).

Lastly, we wished SIGAL to take a risk averse approach to making decisions in the game. Therefore SIGAL used a convex function to represent its utility in the game from an offer \(\omega ^{n}\), which modified Eq. 1.

$$\begin{aligned} \pi '_{a}(\omega ^{n}\mid t_{a}) =\frac{\pi _{a}(\omega ^{n}\mid t_{a})^{(1-\rho )}}{1-\rho } \end{aligned}$$
(27)

The strategy used by SIGAL is obtained by “plugging in” the risk averse utility \(\pi '_{a}(\omega ^{n}\mid t_{a})\) instead of \(\pi _{i}(\omega ^{n}\mid t_{i})\).

6 Modeling human players

In this section we describe a model of people’s behavior used by SIGAL to make optimal decisions in the game. We assume that there is a training set of games played by people, as we show in the next section.

6.1 Accepting proposals

We modeled people’s acceptance of proposals in revelation games using a stochastic model that depended on a set of features. These comprised past actions in the game (e.g., a responder may be more likely to accept a given offer if it revealed its type as compared to the case in which it did not reveal its type) as well as social factors (e.g., a responder player may be less likely to accept a proposal that offers more benefit to the proposer than to itself).Footnote 5

Let \(\omega _{i,j}^{n}\) represent a proposal from a player \(i\) to a player \(j\) at a round \(n\). We describe the following features that affect the extent to which player \(j\) will accept proposal \(\omega _{i,j}^{n}\). These features are presented from the point of view of proposer \(i\), therefore we assume that the type of the proposer \(t_{i}\) is known, while the type of the responder \(t_{j}\) is known only if \(j\) revealed its type. We first detail the features that relate to players’ decisions whether to reveal their types.

  • \(REV_{j}^{0}\). Revelation by \(j\). This feature equals \(1\) if the responder \(j\) has revealed its type and 0 otherwise. The superscript \(0\) indicates this feature is relevant to the revelation phase, which is round 0.

  • \(REV_{i}^{0}\). Revelation by \(i\). This feature equals 1 if the proposer has revealed its type \(t_{i}\).

We now describe the set of features relating to social factors of the responder player \(j\).

  • \(BEN_{j}^{n}\). Benefit to \(j\). The benefit to \(j\) from proposal \(\omega _{i,j}^{n}\) in round \(n\). This measures the extent to which the proposal \(\omega _{i,j}^{n}\) is generous to the responder. In the case where \(j\) revealed its type, this feature equals \(\pi _{j}(\omega _{i,j}^{n}\mid t_{j})\) and computed directly from Eq. 1. Otherwise, the value of this feature is the expected benefit to the responder from \(\omega _{i,j}^{n}\) for all possible responder types \(T_{j}\):

    $$\begin{aligned} \sum _{t_{j}\in T_{j}}p(t_{j}\mid h^{n-1})\cdot \pi _{j}(\omega _{i,j}^{n}\mid t_{j}) \end{aligned}$$
  • \(AI_{i}^{n}\). Advantageous inequality of \(i\). The difference between the benefit to proposer \(i\) and responder \(j\) that is associated with proposal \(\omega _{i,j}^{n}\). This measures the extent to which proposer \(i\) is competitive, in that \(\omega _{i,j}^{n}\) offers more for \(i\) than for \(j\). This feature equals the difference between \(\pi _{i}(\omega _{i,j}^{n},\mathsf {accept}\mid t_{i})\) and \(BEN_{j}^{n}\).

To capture the way the behavior in round \(n=1\) affects the decisions made by participants in round \(n=2\), we added the following features that refer to past offers.

  • \(P.BEN_{j}^{n}\). Benefit to \(j\) in the previous round. This feature equals \(BEN_{j}^{1}\) if \(n=2\), and 0 otherwise.

  • \(P.BEN_{i}^{n}\). Benefit to proposer \(i\) in the previous round. This feature equals \(\pi _{i}(\omega _{i,j}^{1},\mathsf {accept}\mid t_{i})\) if \(n=2\) and 0 otherwise.

To illustrate, consider the asymmetric CT board game shown in Fig. 1b. Alice is missing two green chips to get to the goal and Bob is missing 1 purple chip to get to the goal. Suppose Bob is the first proposer (player \(i\)) and that Alice is the first responder (player \(j\)), and that Bob revealed its goal to Alice, so its type is common knowledge, while Alice did not reveal her goal. We thus have that \(REV_{j}^{0}=0\) and \(REV_{i}^{0}=1\). Alice’s no-negotiation alternative (NNA) score, \(s_{j}(\emptyset )\), is \(70\) points and Bob’s NNA score is \(90\) points.

According to the offer shown in the Figure, Bob offered two green chips to Alice in return for two purple chips. If accepted, this offer would allow Alice to get to the goal in 5 steps, so she will have 19 chips left at the end of the game, worth \(19\cdot 5=95\) points. Similarly, Bob will have 21 chips left at the end of the game, worth 105 points. Both will also earn a bonus of 100 points for getting to the goal. Therefore we have that \(BEN_{j}^{1}=95+100-70=125\). Similarly, Bob’s benefit from this proposal is \(105+100-90=115\) points. The difference between the benefit to Bob and to Alice is \(-10\), so we have that \(AI_{i}^{1}=-10\). Lastly, because the offer is made in round 1, we have that \(P.BEN_{j}^{1}=P.BEN_{i}^{1}=0\). This offer is more generous to Alice than it is to Bob.

Suppose now that Alice rejects this offer and makes a counter proposal in round 2, that proposes one purple chip to Bob in return for four greens. In this example, Alice is using her knowledge of Bob’s type to make the minimal offer that would allow Bob to reach the goal while providing additional benefit to Alice. Alice is the proposer (player \(i\)) and Bob is the responder (player \(j\)). Recall that Bob has revealed its goal while Alice did not, so we have \(REV_{j}^{0}=1\) and \(REV_{i}^{0}=0\). Using a similar computation from before, we get that Bob’s score from the counter proposal is 190 points. Therefor we have that \(BEN_{j}^{2}=190-90=100\). Alice’s benefit from the counter-proposal is \(210-70=140\), therefore we have that \(AI_{i}^{2}=140-100=40\). The last features in the example capture the benefit to both players from the proposal made in the first round to Alice and Bob, so we have \(P.BEN_{j}^{2}=125\), and \(P.BEN_{i}^{2}=115\).

6.1.1 Social utility function

We model the person as using a social utility function to decide whether to accept proposals in the game. This social utility depends on a weighted average of the features defined above. We define a transition function, \(T^{n}\), that maps an offer \(\omega ^{n}\) and history \(h^{n-1}\) to an (ordered) set of feature values \(x^{n}\) as follows.Footnote 6

$$\begin{aligned} x^{n}=\left( REV_{j}^{0},REV_{i}^{0},BEN_{j}^{n},AI_{i}^{n},P.BEN_{j}^{n},P.BEN_{i}^{n}\right) \end{aligned}$$

To illustrate, in the example above, we have that \(x^{1}=\left( 0,1,125,-10,0,0\right) \) and \(x^{2}=\left( 1,0,100,40,125,115\right) \).

Let \(u(x^{n})\) denote the social utility function which is defined as the weighted sum of these features. To capture the fact that a decision might be implemented noisily, we use a sigmoid function to describe the probability that people accept offers, in a similar way to past studies for modeling human behavior [11]. We define the probability of acceptance for a particular features values \(x^{n}\) by a responder to be

$$\begin{aligned} p(r_{i}^{n}=\mathsf {accept}\mid \omega ^{n},h^{n-1})=\frac{1}{1+e^{-u(x^{n})}} \end{aligned}$$
(28)

where \(x^{n}=T^{n}(\omega ^{n},h^{n-1})\). In particular, the probability of acceptance converges to 1 as \(u(x^{n})\) becomes large and positive, and to 0 as the utility becomes large and negative. We interpret the utility to be the degree to which one decision is preferred. Thus, the probability of accepting a proposal is higher when the utility is larger.

6.1.2 Estimating weights

To predict how people respond to offers in the game, it is needed to estimate the weights in their social utility function in a way that best explains the observed data. In general, we need to model the probability that an offer is accepted for any possible instantiation of the history. The number of possible proposals in round 1 is exponential in the combined chip set of players.Footnote 7 It is not possible to use standard density estimation techniques because many such offers were not seen in the training set or were very rare. Therefore, we employed a supervised learning approach that assumed people used a noisy utility function to accept offers that depended on the features defined above. Let \(\varOmega _{i,p}\) denote a data set of offers proposed by some participant \(i\) to a person \(p\).Footnote 8 For each offer \(\omega ^n_{i,p}\in \varOmega _{i,p}\) let \(y(r_p^n\mid \omega ^n_{i,p})\) denote an indicator function that equals 1 if the person accepted proposal \(\omega _{i,p}^n\), and zero otherwise. The error of the predictor depends on the difference between \(y(r_p^n\mid \omega ^n_{i,p})\) and the predicted response \(p(r_{p}^{n}=\mathsf {accept}\mid \omega _{a,p}^{n},h^{n-1})\), as follows:

$$\begin{aligned} \sum _{\omega _{i,p}^n\in \varOmega _{i,p}} \left( p(r_{j}^{n}=\mathsf {accept}\mid \omega _{i,p}^{n},h^{n-1})- y(r^n_j \mid \omega _{i,p}^{n})\right) ^{2} \end{aligned}$$
(29)

where \(p(r_{j}^{n}=\mathsf {accept}\mid \omega _{i,j}^{n},h^{n-1})\) is defined in Eq. 28.

We used a standard Genetic algorithm to estimate weight values for the features of people’s social utility that minimize the aggregate error in the training set. To avoid over-fitting the training set, we used a held-out cross-validation set consisting of 30 % of the data. We chose the instance with minimal error (on the training set) in the generation that corresponded to the smallest error on the cross-validation set. We used ten-fold cross-validation, repeating this process ten times, each time choosing different training and testing sets, producing ten candidate instances. To pick the best instance, we computed the value of the game \(E_{a} \big (\phi ^*_{a}\big )\) for SIGAL for each of the learned models, where \(\phi ^*_{a}\) is defined in Eq. 26. This is the expected benefit for SIGAL given that it chooses optimal actions using a model of people that corresponds to the feature values in each instance.

6.2 Proposing and revealing

This section describes our model of how people make proposals in revelation games and reason about whether to reveal information.

6.2.1 First proposal model

We used standard density estimation techniques (histograms) to predict people’s offers for different types. Based on the assumption that proposals for the first round depend on the proposer’s type and its decision whether to reveal, we divided the possible proposals to equivalence classes according to the potential benefit for the proposer player, and counted how many times each class appears in the set. Let \(p(\omega _{p,j}^{1}\mid t_{p},\phi _{i})\) denote the probability that a human proposer of type \(t_p\) offers \(\omega _{p,j}^{1}\) in round 1. Let \(N_{t_{p},\phi _{p}}\big (\pi _{p}(\omega _{p,j}^{1}\mid t_{p})\big )\) denote the number of proposals in round 1 which gives the human a benefit of \(\pi _{p}(\omega _{p,j}^{1}\mid t_{p})\), given the human is of type \(t_p\) and its revelation decision was \(\phi _{p}\). Let \(N_{t_{p},\phi _{p}}(\varOmega _{p,j}^{1})\) denote the number of proposals in round 1 in this subset. \(p(\omega _{p,j}^{1}\mid t_{p},\phi _{p})\) is defined as:

$$\begin{aligned} p(\omega _{p,j}^{1}\mid t_{p},\phi _{p})= \frac{N_{t_{p},\phi _{p}} \left( \pi _{p}(\omega _{p,j}^{1}\mid t_{p})\right) }{N_{t_{p},\phi _{p}}(\varOmega _{p,j}^{1})} \end{aligned}$$
(30)

6.2.2 Counter-proposal model

According to our model, a player’s proposal in the second round also depends on the history, this two dimensional probability density function tends to be too sparse to calculate it directly as described in Sect. 6.2.1. Inspired by studies showing that people engage in tit-for-tat reasoning [27] we used this principal to model the counter-proposals made by people. We assumed that a responder player \(i\) will be proposed offer \(\omega _{p,i}^{2}\) by a human player in the second round with benefit \(\pi _{i}(\omega _{p,i}^{2}\mid t_{i})\) that is equal to the benefit \(\pi _{p}(\omega _{i,p}^{1}\mid t_{p})\) from offer \(\omega _{i,p}^{1}\) made to the person in the first round, when the human was a responder. For example, suppose that Bob is the proposer in round 1 and propose to Alice a benefit of \(125\). According to the model, if Alice rejects the offer she will propose Bob a counter-proposal that provides Bob with the same benefit, \(125\). Note that this does not assume that the proposal will provide Alice with the same benefit she got from Bob in the proposal from round 1. Formally, let \(N_{\varOmega _{p,i}^{2}}(\pi _{p}(\omega _{i,p}^{1}\mid t_{p}))\) denote the number of counter-proposals \(\omega _{p,i}^{2}\) which give benefit \(\pi _{p}(\omega _{i,p}^{1}\mid t_{p})\). We assume that there always exists at least one proposal that meets this criterion, i.e., \(N_{\varOmega _{p,i}^{2}}(\pi _{p}(\omega _{i,p}^{1}\mid t_{p}))\ne 0\). The “tit for tat” heuristic is as follows:

$$\begin{aligned} p(\omega _{p,i}^{2}\mid h^{1})={\left\{ \begin{array}{ll} 0 &{}\quad \pi _{i}(\omega _{p,i}^{2})\ne \pi _{p}(\omega _{i,p}^{1})\\ {1}/{N_{\varOmega _{p,i}^{2}}(\pi _{p}(\omega _{i,p}^{1}\mid t_{P}))} &{}\quad \mathsf {otherwise}\end{array}\right. } \end{aligned}$$
(31)

This heuristic is used in Eq. 18 to facilitate the computation of the expected benefit from SIGAL as a responder in round 1.

Lastly, we detail the model used by SIGAL to predict whether the person reveals its goal. Let \(N_{t_{p}}\) denote the number of instances in which people were of type \(t_{p}\), and let \(N_{t_{p}}(\phi _{p})\) denote the number of times that people of type \(t_{p}\) chose to reveal their type. The probability that a human player \(p\) revealed its type \(t_{p}\) is defined as:

$$\begin{aligned} p(\phi _{p}\mid t_{p})=\frac{N_{t_{p}}(\phi _{p})}{N_{t_{p}}} \end{aligned}$$
(32)

7 Empirical methodology

In this section we describe the methodology we used in order to learn the parameters of the model of how people play revelation games, and to evaluate it. For these purposes we recruited 260 students enrolled in a computer science or software engineering program at several universities and colleges. An additional 143 people were recruited using the Amazon Turk framework. All subjects received an identical tutorial on revelation games that was exemplified on a board (not the boards used in the study). Actual participation was contingent on successfully answering a set of basic comprehension questions about the game. Participants were seated in front of a terminal for the duration of the study, and could not speak to any of the other participants. Each participant played two revelation games on different boards.

The boards in the study fulfilled the following conditions at the onset of the game: (1) There were two goals for each player; (2) Every player lacked one or two chips to reach each of its possible goals; (3) Every player possessed the chips that the other needed to get to each of its possible goals; (4) There existed at least one exchange of chips which allowed both players to reach each of their possible goals; (5) the goals were distributed with a probability of 50 % for both players.

We used the asymmetric and symmetric boards shown in Fig. 1a, b. Participants played both symmetric and asymmetric boards in random order. They engaged in a neutral activity (answering demographic questions) between games to minimize the effects of their behavior in the first game on their behavior in the second game. The participant chosen to be the proposer in the first game was randomly determined, and participants switched roles in the second game, such that the proposer in the first game was designated as the responder in the second game. A central server (randomly) matched each participant with a human or an agent counterpart for each game. The identity of each participant was not disclosed. We collected players’ proposals, responses and revelation decisions for all of the games played. To avoid deception all participants were told they would be interacting with a computer or a person. Participants received fixed compensation (course credit) for participating in the experiment.Footnote 9

We divided subjects into four disjoint pools. The first pool consisted of people playing other people (66 games). The second pool consisted of people playing a computer agent that used a randomized strategy to make offers and responses (170 games). The purpose for this pool was to collect people’s actions for diverse situations, for example, their response to offers that were never made by other people. Two thirds (44 games) of the data from the first pool and the data from the second pool were used for training a model of people’s behavior. The third pool consisted of people playing the SIGAL agent (238 games). The fourth pool (118 games) consisted of people playing an agent using several of the equilibrium strategies defined in Sect. 4 to play revelation games. The equilibrium agent adapted a non-revelation and revelation strategy equilibrium as specifed in Sects. 4.1 and 4.2 for each game.

8 Results and discussion

The performance of SIGAL was measured by comparing its performance against people (the third pool) with people’s play against other people (the first pool). We list the number of observations and means for each result. All results reported in this section are statistically significant in the \(p<0.05\) range using t tests for normally distributed data and two-sample Mann–Whitney rank tests for non-normally distributed data.

8.1 Analysis: general performance

We first present a comparison of the performance of SIGAL and people. Figure 2 shows the average benefit (the difference in score between agreement and the no-negotiation alternative score) for different roles (proposer and responder). As shown by the Figure, the SIGAL agent outperformed people in all roles. In addition, SIGAL was also more successful at reaching agreements than were people. Figure 3 shows the percentage of offers accepted by people for the different roles. As shown by the Figure, proposals made by SIGAL in round 1 were accepted 62 % of the time, while proposals made by people in round 1 were accepted only 49 % of the time. This difference is more pronounced in round 2, in which proposals made by SIGAL were accepted 83 % of the time, while offers made by people in round 2 were only accepted 63 % of the time. If an offer is rejected at this last round, the game ends without agreement. This striking difference shows that SIGAL learned to make good offers at critical points in the game.

Fig. 2
figure 2

Performance comparison

Fig. 3
figure 3

Agreement comparison

As shown in Figure 2 SIGAL also outperformed the equilibrium agent in both rounds. The equilibrium agent was fully strategic and assumed the other player was unboundedly rational. Although not shown in the Figure, it made very selfish offers in the last round, offering only 25 average benefit points to people and 215 benefit points to itself. Most of these offers (54 %) were rejected. In the first round, it made offers that were highly beneficial to people, offering 219 average points to people and 20 to itself. Most of these offers (82 %) were accepted, but the small benefit it incurred in these proposals not aid its performance.

Fig. 4
figure 4

Average proposed benefit in first and second rounds

To explain the success behind SIGAL’s strategy, we present a comparison of the benefit from proposals made by the SIGAL agent and people in both game rounds in Fig. 4. As shown by the Figure, both people and SIGAL made offers that were beneficial to both players in rounds 1 and 2. However, SIGAL made offers that were significantly more generous to human responders than did human proposers (118 benefit points provided by SIGAL as proposer in round 1 versus 96 points provided by human proposers; 110 benefit points provided by SIGAL as proposer in round 2 versus 81 benefit points provided by human proposers). In fact, the proposals made by SIGAL pareto dominated the proposals made by people in both rounds 1 and 2. Thus, SIGAL learned to make offers that were better for human responders without compromising its own utility

SIGAL’s strategy is highlighted by examining the weights learned for the different features of how people accept offers. As shown in Table 1, the largest weight was assigned to \(BEN_j^n\), the benefit to the responder from an offer. In addition, the weight for \(AI_{i}^{n}\) measuring the difference between the benefit for the proposer and responder was large and negative. This means that responders prefer proposals that provide them with large benefits, and are also competitive, in that they dislike offers that provide more to proposers than to responders. The offers made by SIGAL reflect these criteria. In particular, proposers asked more for themselves than for responders in both rounds. In contrast, SIGAL equalized the difference in benefit between proposers and responders in round 1, and decreased the difference between its own benefit and responder’s benefit in round 2 as compared to human proposer.

Table 1 Features coefficients weights

8.2 Analysis: revelation of goals

We now turn to analyzing the affect of goal revelation on the behavior of SIGAL. Recall that \(E_{a} \left( \phi ^*_{a}=t_a \right) \) denotes the value of the game for SIGAL when deciding to reveal its goal in round 0, and behaving optimally according to its model of how people make offers. Similarly, \(E_{a} \left( \phi ^*_{a}=null \right) \) denotes the value of the game for SIGAL when deciding not to reveal its goal in round 0. Our model predicted no significant difference in value to SIGAL between revealing and not revealing its goal, i.e. \(E_{a} \left( \phi ^*_{a}=null \right) \approx E_{a} \left( \phi ^*_{a}=t_a \right) \) for each type \(t_a\in T_a\). Therefore we used two types of SIGAL agents, one that consistently revealed its goal at the onset of the game and one that did not reveal. In all other respects these agents followed the model described in Sect. 5. The empirical results confirmed the model’s prediction, in that there was no significant difference in the performance of the two SIGAL agents for all boards and types used in the empirical study.

However, the decision of the person to reveal or not reveal its goal had a significant affect on the negotiation strategy of SIGAL. When people revealed their goals, SIGAL was significantly more competitive as compared to the case in which people did not reveal their goals. Specifically, the competitive weight \(AI_{i}^{n}\) from proposals made by SIGAL in round 1 was 14 points when people revealed their goal, and \(-\)3 points when people did not reveal their goal. This means that SIGAL learned to make offers that were significantly more competitive when people revealed their goals.

8.3 Analysis: people’s strategies

We now turn to describe several features in people’s behavior demonstrating strategic behavior in the game that resembled the way SIGAL played. In the asymmetric game, people in weak-type roles were more likely to reveal their goals than people in strong-type roles. In particular, in 65 % of games played in the asymmetric game, strong-type players engaged in “bluffing”, that is, not disclosing their true goal and asking for more chips than they actually need to get to the goal in round 1. Similarly, in 56 % of games in the asymmetric condition, weak-type players engaged in bluffing. Interestingly, this trend was also apparent for the symmetric game, in which bluffing occurred in 63 % of the games played. Second, people’s proposals were significantly more selfish and less generous in round 2 than in round 1. This is apparent from Fig. 2. Interestingly, people’s performance was significantly higher when playing with SIGAL than when playing with other people. Specifically, people playing with SIGAL achieved an average performance of 112 points, while people playing with other people achieved an average performance of 91 points. This shows that SIGAL had a positive affect on people’s play.

The last part of the analysis compares performance of subjects enlisted using Amazon Turk (MTurk) with students. There are several works dupicating lab studies using MTurk showing similar patterns of behavior [1]. In our work, we found that in general, students were better performers than MTurk workers (average score was 123 points for students and 109 points for MTurk workers). This can be explained by the fact that students were significantly more selfish as second proposers (average 45 points for students and 20 points for MTurk workers). Also, MTurk workers accepted significantly more proposals than did students in the first round (75 % average acceptance rate for MTurk compared to 59 % average acceptance rate for students). These results suggest that integrating information revelation with negotiation required more cognitive effort from people than canonical decision-making tasks.

8.4 Limitations

We conclude this section with describing several limitations of our approach. First, our definition of revelation games, which the SIGAL decision-making model is tailored to support, defines a single revelation phase followed by two rounds of take-it-or-leave-it offers. This model fails to describe more involved negotiations which may include additional rounds or multiple instances of revelation. Indeed, some agents may wish to reveal their goal only after not succeeding to reach agreement in several rounds. Allowing for more involved revelation protocols raises significant computational challenges which remain outside the scope of this paper. Second, the decision-making model used by SIGAL in this paper assumes that all people use tit-for-tat reasoning when predicting their performance, which may not hold in practice. Indeed, or results showed considerable variance in people’s behavior for different countries. We hypothesize that learning separate decision-making models for each conuntry may additionally improve the prediction power of SIGAL and consequently, its performance.

9 Conclusion and future work

This paper presented an agent-design for interacting with people in “revelation games”, in which participants are given the choice to truthfully reveal private information prior to negotiation. The decision-making model used by the agent reasoned about the social factors that affect people’s decisions whether to reveal their goals, as well as the effects of people’s revelation decisions on their negotiation behavior. The parameters of the model were estimated from data consisting of people’s interaction with other people. In empirical investigations, the agent was able to outperform people playing other people as well as agents playing equilibrium strategies and was able to reach agreement significantly more often than did people. We are currently extending this work in two directions. First, we are considering more elaborate settings in which players are able to control the extent to which they reveal their goals. Second, we are using this work as the basis for a more broad argumentation in which agents integrate explanations and justifications within their negotiation process. Lastly, information revelation strategies can also be adopted by the negotiation community at large, for example in the International Automated Negotiation Agent Competition (ANAC) [3].