1 Introduction

In a number of actual interactive decision situations (game situations) there are a number of decision makers that interact strategically over time, but each one of them has only a partial knowledge of the intentions of the others. A particular example is an electricity market where several production firms are competing repeatedly over time and each firm knows its own costs but not the costs of the other firms [17, 42]. Such strategic interactions over time can be described by dynamic games with incomplete structural information.

Quite often it is very difficult to find a Nash equilibrium for dynamic or repeated games with incomplete information. Two difficult problems may arise; the first is due to the “Witsenhausen effect” [44], i.e., the current action of each player affects the future state or parameter estimation of the other players. The second is due to the “dual control effect” [13], i.e., that the current action of a player affects the quality of his own future parameter estimation. The latter difficulty makes the optimization problems involved very difficult. Even in the degenerate game case (one player game) the optimal control problem with unknown parameters has not been solved analytically, except only for a few special cases [45].

In this context, it has been proposed that the players (instead of trying to find the equilibrium strategies) use some simple deterministic or stochastic dynamic rules which determine the future actions of the players based on their past actions. The use of such rules, usually, reflects the bounded rationality [38, 41] of the participants of the game, i.e., their inability to solve very difficult problems. These rules can be models of learning, adaptation, evolution or imitation [16, 40, 47] ch. 7, [10, 27, 31]. Most of the research on dynamic rules focuses on convergence issues. However, a set of such strategies, typically, is not in Nash equilibrium.

A set of dynamic rules often converges to an equilibrium or a correlated equilibrium of a simplified game. For example, in repeated finite games no regret learning rules converge to a correlated equilibrium of the static full information game ([21]) and in the same class of games a set of uncoupled stochastic rules converges probabilistically to a Nash equilibrium of the corresponding static game [14]. Such results are often used to justify (or as an alibi for) the use of the equilibrium notions of the simplified games (for example, static full information) to the repeated or dynamic incomplete information game situation.

Are these adaptive/learning strategies a reasonable prediction for the evolution of the game? Particularly, if the players knew that they are going to implement those strategies, would they stick to them? When a player is implementing an adaptive control law, she may be viewed by another player as a system under control. That is, the other player may try to use the knowledge of the adaptation law of the first player to manipulate her. The topic of this work is to study phenomena occurring when the players are trying to manipulate one another and the implications that manipulation may have to the costs of the participants of the game.

1.1 Contribution

At first, a general form of adaptive/learning strategies is presented and an optimal control problem for strategic manipulation is stated. Then a smaller class of manipulating strategies is considered, called the “pretender’s” strategies. The manipulating player implements her learning/adaptation rule as if she had a different, not real type (preferences). Outcomes alternative to the Nash equilibrium are proposed for the cases in which one player pretends and all the players pretend. Under some technical conditions, if only one player pretends, she may achieve the same outcome as if she were a Stackelberg leader. We then study the case in which all the players are pretending, and some possible limit points are identified, using an auxiliary game called the “pretenders’ game.”

In order to gain some intuition, the notions described are applied to a very simple class of repeated quadratic opinion games, where analytical results are possible. At first, games with two players are analyzed and relationships of pretending and Stackelberg leadership are derived. It turns out that pretending enhances competition among the players. In the case of a repeated Stackelberg interaction, it turns out that if the Stackelberg follower is pretending, the follower can achieve approximately the same cost as if she were acting on the Stackelberg leader’s behalf.

The cases of repeated quadratic opinion games with many players having symmetric and asymmetric interactions are then analyzed. In the symmetric case, it turns out that when the number of players increases, the motivation to pretend decreases, the value of pretending decreases, and the outcome approaches the Nash equilibrium.

In the asymmetric influence case, the motivation of each player to pretend depends on a measure of the centrality of her position in the network of interactions. In the case where there is one influential player and many players each one of which has a rather small influence, only the influential player has a non-negligible motivation to pretend and the outcome approaches the Stackelberg equilibrium with the influential player as the leader. Thus, a “natural” Stackelberg leadership appears.

A linear quadratic game with two players is then analyzed. The game describes the interaction of two countries emitting pollutants. In this example, each player has a motivation to pretend to have a lower discount factor, i.e., to pretend that she cares less about the future. The outcome of the game when the players are pretending is to emit more and have a more polluted environment. In all the examples of quadratic and linear quadratic games analyzed, pretending is not a cooperative action.

Two competitive electricity market models are then studied. At first, a linear supply function model is considered. In this game situation, the players have a motivation to pretend to have a larger production cost in order to increase the price. Pretending in this model is a cooperative action, i.e., the pretending of the one player is beneficial to the other. The outcome when all the players pretend, is that the players produce less, the price is higher, and their profit increases, compared to the Nash case.

In the second electricity market model a mechanism from the literature is considered [34]. The mechanism is designed in order to work optimally in the Nash equilibrium. However, each one of the participants has a motivation to pretend. Furthermore, if all the participants pretend the prices diverge and the system is not working at all.

1.2 Related Literature

A related topic is the literature on reputation effects [15, 26, 28]. These models consider a long-run, patient player and many short-lived or not patient opponents. Assuming that there is a possibility (even with small probability) that the patient player has a “committed type” (or tough type), the patient player may gain in equilibrium a strategic advantage, similar to the Stackelberg leadership. A model of two-sided reputation was studied in [3], resembling a war of attrition situation. The basic difference of the current work with the theory of reputation games, is that in reputation games the roles of the players are asymmetric and a priori specified.

A very interesting related stream of research is the theory of indirect evolution ([12, 18, 19, 22, 23, 25]). These models assume rational players the preferences of which are affected by evolutionary forces. The evolution of preferences may explain phenomena such as altruism, spite, overconfidence, fairness, and reciprocity ([23]). In [1] it is shown that if both the preferences (or the perceptions) and the actions are evolving, the players would behave in an effectively rational way. The basic differences of the current work from the theory of indirect evolution are that the current work studies dynamic games, as well as repeated, and does not assume a full rationality for the participants. Another related research area is the theory of coevolutionary games [33, 43]. In coevolutionary games in addition to the players’ strategies, other properties such as the structure of the network describing the interactions or the players payoffs evolve, leading to different outcomes from the evolutionary games. An interesting class of telecommunication games is studied in [2, 4, 5]. In these examples, the players are better of, if instead of optimizing against their own preferences, they have an amount of altruism, i.e., optimize against a weighted sum of the payoffs of all the players. This effect is called “Paradox in Cooperation.”

A closely related topic is the theory of strategic delegation [39]. This theory studies situations where the acting party delegates its decision to another party having different interests.

Another related topic is the control of dynamic rules and strategic teaching. In [35], the optimal control of the best reply is studied and in [36] it is shown that in all the uncoupled learning rules the players have an incentive to deviate. Furthermore, in [37] it is shown that in a mixed population of myopic best response and mimic players, the imitators are better off. The difference of the current work with the theory of strategic teaching is that first it does not assign different roles to the players a priori and second that it suggests some alternative outcomes for the game.

In this work, we study the relationship among pretending and Stackelberg leadership. Endogenous models for Stackelberg leadership were studied in [7, 20].

1.3 Organization

The rest of this paper is organized as follows: In Sect. 2, the game and learning models are described. The pretenders’ strategies are studied in Sect. 3. The opinion quadratic games are studied in Sect. 4. Section 5 studies the LQ environmental game. Section 6 studies some electricity market games. Particularly, Sect. 6.1 studies a supply function model and Sect. 6.2 a mechanism presented in [34]. Finally, Sect. 7 summarizes the main contributions and proposes some future work.

2 Game Description and Adaptive Strategies

Let us start describing an N-player game. The players have types \(\theta _1,\dots , \theta _N\). Each player knows her own type \(\theta _i\), and \(\theta _1,\dots , \theta _N\) are part of a random vector \(\bar{\theta } = [\theta _1 ~ \dots ~ \theta _N ~ ~\theta ]^T \in \bar{\varTheta } = \varTheta _1\times \dots \times \varTheta _N\times \varTheta \) which follows a commonly known distribution.

The dynamics has the form:

$$\begin{aligned} x_{k+1}=f\left( x_k,\bar{\theta },u_k^1,\dots ,u_k^N,w_k\right) , \end{aligned}$$
(1)

where \(u_k^1,\dots , u_k^N\) are the action variables of the players and \(w_k\) is a random disturbance with a commonly known distribution.

The cost functions have the form:

$$\begin{aligned} J_i = E\left[ \sum _{k=0}^T \rho _{\theta _i}^k L_i\left( x_k,\theta _i,u_k^1,\dots ,u_k^N\right) \right] , \end{aligned}$$
(2)

where \(\rho _{\theta _i}\in (0,1]\) is a discount parameter and T can have a finite or an infinite value. An alternative formula for the costs is:

$$\begin{aligned} J_i = \lim _{T \rightarrow \infty } E\left[ \frac{1}{T} \sum _{k=0}^{T} L_i\left( x_k,\theta _i,u_k^1,\dots ,u_k^N\right) \right] . \end{aligned}$$
(3)

Equations (1), (2) or (1), (3) can describe dynamic games as well as repeated static games.

Each player receives at each time step an information vector according to one of the following information structures:

Information Structure 1:

$$\begin{aligned} \bar{I}^{i,new}_{k} = \left( x_k,u_{k-1}^i\right) \text {~or} \end{aligned}$$
(4)

Information Structure 2:

$$\begin{aligned} \bar{I}^{i,new}_{k} = \left( x_k,u_{k-1}^i,u_{k-1}^{-i}\right) , \end{aligned}$$
(5)

where the \(u^{-i}\) stands for \((u^{1},\dots ,u^{i-1},u^{i+1},\dots ,u^{N})\). The information that each player possesses at time step k has the form \(I^i_k = \left( \bar{I}^{i,new}_{0},\dots , \bar{I}^{i,new}_{k}\right) \).

The strategies of the players are given by \(s^i = (\gamma ^i_1,\gamma ^i_2,\dots )\), \(i=1,\dots ,N\), where \(\gamma ^i_k\) is a function having the form:

$$\begin{aligned} \gamma ^i_k: (x_0,x_1,\dots ,x_k,\theta _i)\mapsto u_k^i\in U^i. \end{aligned}$$

We will focus on “state feedback” strategies, where all the previous information is used only for “adaptation.” Specifically,

$$\begin{aligned} u_k^i=\gamma _k^i\left( x_k,\theta _i, \hat{\theta }^i_k\right) , \end{aligned}$$
(6)

where \( \hat{\theta }^i_k\) is the adapted parameter of Player i. We assume that the adapted parameter evolves according to a dynamic equation:

$$\begin{aligned} \hat{\theta }^i_{k+1} = \phi ^i \left( \hat{\theta }^i_{k},\bar{I}^{i,new}_{k+1},\theta _i\right) . \end{aligned}$$
(7)

For infinite horizon games, the following property, which holds true for several learning rules, is quite interesting (see for example [46]):

Property 1: The adapted parameters \(\hat{\theta }_k^1,\dots , \hat{\theta }_k^N\) converge to some limits \(\hat{\theta }_\infty ^1,\dots , \hat{\theta }_\infty ^N\) such that the feedback (no memory) strategies \(\gamma _k^i(x_k,\theta _i, \hat{\theta }_\infty ^i), ~i=1,2\) constitute a strongly time consistent Nash equilibrium ([8]) for the corresponding complete information game.

Most of the research in adaptive/learning rules focuses on convergence issues, and several converging learning rules have been proposed. Consider such a set of adaptive/learning rules in form (6), (7), and focus on Player i’s perspective. Player i faces a stochastic control problem with dynamics having state variable \((x_k,\hat{\theta }_k^{-i})\) and incomplete state observation \(y_k=x_k\). Furthermore, she has only a probabilistic description of the unknown parameters \(\theta ,\theta _{-i}\). Thus, Player i can use the information available in order to try to manipulate the overall system. However, the control problem described is, in general, very difficult. In the following section, we restrict ourselves to a simpler class of manipulating strategies.

3 Pretending Strategies

In this section we focus on a special class of manipulation strategies called the pretender’s strategies. Particularly, the manipulating player pretends to have a false type, probably in a response to the other players’ actions. For simplicity, we assume that the value of \(\theta \) is commonly known, i.e., \(\varTheta =\{\theta \}\). In what follows, only games with infinite horizon will be considered.

The general form of a pretender’s strategy that corresponds to (6), (7) is:

$$\begin{aligned} \hat{\theta }^i_{k+1}&= \phi ^i \left( \hat{\theta }^i_{k},\bar{I}^{i,new}_{k+1},\theta ^{i,pr}_k\right) ,\end{aligned}$$
(8)
$$\begin{aligned} u_k^i&=\gamma _k^i\left( x_k,\theta ^{i,pr}_k, \hat{\theta }^i_k\right) , \end{aligned}$$
(9)

where the pretended type \(\theta ^{i,pr}_k\) is given as an output of a system:

$$\begin{aligned} z^i_{k+1}= & {} \phi ^{i,pr} \left( z^i_k, \bar{I}^{i,new}_{k+1},\theta _i\right) ,\end{aligned}$$
(10)
$$\begin{aligned} \theta ^{i,pr}_k= & {} \psi ^i \left( \theta _i, z^i_k\right) . \end{aligned}$$
(11)

Equations (8 9)–(11) represent a manipulating player who pretends to have a type that depends on her real type and a new, probably augmented, adapted parameter \(z^i_k\). That is, in order to pretend adequately, it is probably useful to accumulate more information.

In what follows, we consider the stationary cases where one, some or all the players are pretending. These stationary outcomes are possible limit points of learning/adaptation rules, when the players are pretending.

3.1 Optimal Stationary Pretending

We consider the possible limit points of the pretending strategies, assuming games with infinite horizon and long-run average cost. We assume that only Player \(i_0\) is pretending. In the spirit of Property 1, we analyze the following situation. Player \(i_0\) has discovered all the useful information for \(\theta _{-i_0}\), and the pretended type of Player \(i_0\) has converged to \(\theta _\infty ^{i_0,pr}\). The rest of the players (\(-i_0\)) react to a player of type \(\theta _\infty ^{i_0,pr}\). Furthermore, the set of strategies \(\gamma ^{i_0}(x_k,\theta ^{i_0,pr}_\infty , \hat{\theta }^{i_0}_\infty ), \gamma ^{j}(x_k,\theta _j, \hat{\theta }^j_\infty ) \), \(j\ne i_0\) constitute a strongly time consistent Nash equilibrium for the corresponding game with full information and types \(\theta ^{i_0,pr}_\infty ,\theta _{-i_0}\).

In order to define the optimal stationary pretending, the following assumption is made:

Assumption 1

For any set of types \(\theta _1,\dots , \theta _N\), there exists a unique a strongly time consistent Nash equilibrium of the corresponding full information game. Let us denote by \(\gamma ^{i,N}_{\theta _1,\dots ,\theta _N}(x_k),~ i=1,\dots ,N\) the set of strategies constituting the Nash equilibrium. Furthermore, fixing any subset of strategies, \(\gamma ^{i}\) for \(i\in I_0\subset \{1,\dots ,N\}\) there is a unique set of strategies \((\gamma ^{j})_{j\not \in I_0}\) such that each \(\gamma ^{j }\) is a best response (B.R.) to \(\gamma ^{1},\dots , \gamma ^{j -1}, \gamma ^{j +1}\dots , \gamma ^{N}\).\(\square \)

The optimal stationary pretending for Player \(i_0\) is given by:

$$\begin{aligned} \theta ^{i_0,pr}_\infty = \underset{\tilde{\theta }_{i_0} \in \varTheta _{i_0}}{\text {argmin}} J_{i_0}\left( \gamma ^{{i_0},N}_{\tilde{\theta }_{i_0},\theta _{-{i_0}}},\gamma ^{{-i_0},N}_{\tilde{\theta }_{{i_0}},\theta _{-{i_0}}}\right) . \end{aligned}$$
(12)

It is interesting to compare the cost that the pretending player \(i_0\) attains with the cost of the full information game having Player \(i_0\) as Stackelberg leader. Let us first recall the Stackelberg equilibrium notion under Assumption 1 (see for example [8]).

Definition 1

Consider a set of strategies \((\gamma ^{{i_0},S}_{ \theta _{i_0},\theta _{-i_0}},\gamma ^{-{i_0},S}_{ \theta _{i_0},\theta _{i_0}})\) within the feedback (no memory) class of strategies. This set constitutes a Stackelberg equilibrium with Player \({i_0}\) as the leader if:

$$\begin{aligned} \gamma ^{{i_0},S}_{ \theta _1,\dots ,\theta _N}\in \underset{\gamma ^{i_0}}{\text {argmin}} \left\{ J_{i_0}(\gamma ^{i_0},\gamma ^{-i_0}): \text{ for } \text{ all } j\ne i_0, \gamma ^{j} \text { is a B.R. to } \gamma ^{-j} \right\} . \end{aligned}$$
(13)

and for all \(j\ne i_0\), \(\gamma ^{j,S}_{ \theta _{i_0},\theta _{-i_0}}\) is a best response to \( \gamma ^{-j}\). \(\square \).

We will use the term Stackelberg games, for games in which one player called the Stackelberg leader announces her strategy and sticks to it, while the other player or players decide their actions taking into account leader’s announcements.

Proposition 1

If Player \(i_0\) pretends optimally, then:

$$\begin{aligned} J_{i_0} \left( \gamma ^{{i_0},S}_{ \theta _{i_0},\theta _{-i_0}},\gamma ^{-{i_0},S}_{ \theta _{i_0},\theta _{-i_0}}\right) \le J_1 \left( \gamma ^{{i_0},N}_{ \theta ^{{i_0},pr}_\infty ,\theta _{-i_0}},\gamma ^{{i_0},N}_{ \theta ^{{i_0},pr}_\infty ,\theta _{-i_0}}\right) . \end{aligned}$$
(14)

An equality is attained is there exist a \(\tilde{\theta }_{i_0} \in \varTheta _{i_0}\) such that \(\gamma ^{{i_0},S}_{ \theta _{i_0},\theta _{-i_0}} = \gamma ^{{i_0},N}_{ \tilde{\theta }_{i_0},\theta _{-i_0}} \).

Proof

The strategy class \(\{\gamma ^{{i_0},N}_{\tilde{\theta }_{i_0},\theta _{-i_0}}: \tilde{\theta }_{i_0}\in \varTheta _{i_0}\}\) is a subclass of the feedback (no memory) strategy class. Furthermore, using Assumption 1, there is a unique set of strategies \((\gamma ^{{-i_0},N}_{\tilde{\theta }_{i_0},\theta _{-i_0}})\) such that the strategy of all the players except \({i_0}\) be a best response to the strategies of the others. Hence, the minimum in (13) is less than or equal to the minimum in (12). \(\square \)

Remark 1

Proposition 1 says that if there is enough uncertainty and only one player pretends, then this player attains the same cost, as if she were a Stackelberg leader. \(\square \)

3.2 The Pretenders’ Game

In this subsection, we study the case where two or more players are pretending. Under Assumption 1, an auxiliary game, called the pretenders’ game, can be defined.

Definition 2

For the game described by (1), (3), under Assumption 1, the corresponding pretenders’ game involves the same players, the action of Player i is \(\theta ^{i,pr}\in \varTheta _i\) and the cost is given by:

$$\begin{aligned} \bar{J}_{i} (\theta ^{i,pr},\theta ^{-i,pr} ) = J_i \left( \gamma ^{i,N}_{\theta ^{i,pr},\theta ^{-i,pr}},\gamma ^{-i,N}_{\theta ^{i,pr},\theta ^{-i,pr}}, \theta _i\right) , \end{aligned}$$
(15)

where \(\theta _i\) is the actual type. Let us also introduce the \(I_0\)-constrained pretenders’ game as the corresponding game where the actions of a set of players are constrained to coincide with the true value of the players’ types, i.e., \(\theta ^{i,pr}=\theta _{i}\), for \(i\in I_0\).

An interesting point is the Nash equilibrium of the pretenders’ game. This point can serve as an alternative prediction of the outcome of the original game.

Definition 3

For the game described by (1), (3), under Assumption 1, a Nash pretenders’ outcome is the set of strategies \((\gamma ^{1,N}_{\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty },\dots ,\gamma ^{N,N}_{\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty })\), where \((\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty )\) is an equilibrium of the corresponding pretenders’ game. The constrained Nash pretenders’ outcome is defined similarly assuming that \((\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty )\) is an equilibrium of the constrained pretenders’ game.

The Nash pretenders’ outcome is interesting as a possible limit point of a learning/adaptive algorithm, where each of the players pretends dynamically to have a false type.

Remark 2

The pretending outcomes described, i.e., the optimal stationary pretending for one player and the Nash pretenders’ outcome when all the players are pretending, are at least as reasonable as the application of the original dynamic rules, due to the fact that each player may have a motivation to pretend.

Let us note that there are also other works studying deviations from best response. For example, in smooth fictitious play, for repeated finite games (ex. Ch 4 of [16]), the players are slightly deviating from their best response toward a randomized policy. This is different from pretending policies, due to the fact that in the later case the deviations from best response are used to induce a desired behavior for the other players. \(\square \)

Remark 3

Let us comment on the relationship with the work in [4]. This work identifies an interesting phenomenon in routing games with partially altruistic players: In some cases “When a user increases its degree of cooperation while other users keep their degree of cooperation unchanged, leads to performance improvement of that user.” To be more precise, if \(J_i\) is the “selfish” cost of Player i, [4] considers the case where the players are partially altruistic, that is each player tries to minimize:

$$\begin{aligned} \tilde{J}_i = (1-a_i) J_i + a_i J_{-i}, \end{aligned}$$
(16)

where \(a_i\) is the degree of cooperation. It is then shown numerically that, when the players apply strategies constituting a Nash equilibrium of the game described by (16), it is possible that \(J_i\) decreases, as \(a_i\) increases. This phenomenon is called the “Paradox in cooperation.” In this context the notion of the price of unilateral altruism was introduced [5].

Consider a situation where the degree of cooperation of Player 2 is commonly known, whereas the degree of cooperation of Player 1 is known only to Player 1 and there is a “Paradox in cooperation.” Then, Player 1 has an incentive to pretend to be more altruistic than she actually is. Furthermore, even if Player 2 knew the actual degree of cooperation of Player 1, she would have an incentive to accept that Player 1 is more altruistic; otherwise, she would induce a less altruistic behavior to Player 1 and thus hurt herself. Finally, Player 2 cannot distinguish between the case that Player 1 is altruistic and tries to reach a Nash equilibrium (of the corresponding full information game) and the case that Player 1 is not altruistic and pretends. \(\square \)

4 Quadratic Opinion Games

In this section, a simple class of repeated, quadratic games is analyzed. In this class of games, exact results can be obtained. We first analyze a two-player simultaneous move (Nash) repeated game and then a sequential move (Stackelberg) repeated game. We then move to symmetric and asymmetric games with many players. The results of this section help us to get some intuition on the relationships of pretending and strategic leadership. Throughout this section we assume long-time average costs.

4.1 A Two-Player Simultaneous Move Game

A two-player repeated opinion game in strategic form is studied. The cost function is given by (3) and the instantaneous costs by:

$$\begin{aligned} L_1 = (u^1-\theta _1)^2+(u^1-u^2)^2, \end{aligned}$$
(17)
$$\begin{aligned} L_2 = (u^2-\theta _2)^2+(u^2-u^1)^2, \end{aligned}$$
(18)

where \((\theta _1, \theta _2)\in \mathbb {R}^2\). We assume an information structure given by (5). The full information (static) game has a unique Nash equilibrium:

$$\begin{aligned} u^{i,N} = \frac{2}{3}\theta _i+\frac{1}{3}\theta _{-i}, ~~ i=1,2. \end{aligned}$$
(19)

Several adaptive (iterative) techniques were studied in [32]. Probably, the simplest one is the best response map:

$$\begin{aligned} u^i_{k}&= \left( \theta _i+\hat{\theta }^i_k\right) /2,\end{aligned}$$
(20)
$$\begin{aligned} \hat{\theta }^i_k&=u^{-i}_{k-1}. \end{aligned}$$
(21)

If both players follow their best response maps, their actions will converge to the Nash equilibrium of the full information static game.

Let us then study the case where Player 1 tries to manipulate Player 2 and Player 2 follows (20), (21). Due to the fact that the map \(\theta _1\mapsto u^{1,N}\) in (19) is onto, Proposition 1 applies. Thus, the feedback Stackelberg cost for Player 1 is feasible through pretending.

The optimal pretending for Player 1 is given by:

$$\begin{aligned} \theta ^{1,pr}_\infty = \frac{6}{5} \theta _1 -\frac{1}{5} \theta _2. \end{aligned}$$
(22)

Player 1 can use several ways to learn \(\theta _2\) in order to implement her pretending policy. One way is to use only the last iteration. Particularly:

$$\begin{aligned} z^{1,1}_{k+1}= & {} z^{1,2}_{k}, \end{aligned}$$
(23)
$$\begin{aligned} z^{1,2}_{k+1}= & {} u^1_{k}, \end{aligned}$$
(24)
$$\begin{aligned} \theta ^{1,pr}_k= & {} \frac{6}{5} \theta _1 -\frac{1}{5} \left( 2u^2_{k-1}- z^{1,1}_{k}\right) . \end{aligned}$$
(25)

Figure 1 shows the action trajectories when no player, one player and both players are pretending. The parameters are \(\theta _1=1\) and \(\theta _2=-1.3\). If no player is pretending, then the dynamic rules lead to the Nash equilibrium. If a single player is pretending, the dynamic rule converges to the Stackelberg equilibrium having the pretending player as a leader. Finally, in the case where both players are pretending the dynamic rule converges to the Nash pretenders’ outcome.

Fig. 1
figure 1

The solid line corresponds to the best response with no pretending players, the dashed line line to Player 1 pretending, the dash-dot line line to Player 2 do so and the dotted line to the case in which both players pretend. It turns out that the outcomes in the several cases studied are widely different

4.2 Two-Player Sequential Move Games

We then study a dynamic game which represents the repeated version of a static Stackelberg game (or equivalently a repeated extensive form game). Particularly, Player 1 acts at time instants \(1,3,5,\dots \) and Player 2 at \(2,4,6,\dots \). The feedback Nash equilibrium of the dynamic game corresponds to the Stackelberg equilibrium of the static game.

The instantaneous costs are given by:

$$\begin{aligned} L^1(k)&={\left\{ \begin{array}{ll} \left( x_k-1\right) ^2+\theta _1\left( x_k-u^2_k\right) ^2,~ &{}k=2,4,..\\ 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(26)
$$\begin{aligned} L^2(k)&={\left\{ \begin{array}{ll} \left( u_k^2-2\right) ^2+\theta _2\left( x_k-u^2_k\right) ^2,~ &{}k=2,4,.. \\ 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(27)

and the state equation by:

$$\begin{aligned} x_{k+1} = {\left\{ \begin{array}{ll} u^1_k &{}\text{ if } k = 1,3,5,\dots \\ \emptyset &{} \text{ if } k=2,4,6 \end{array}\right. } \end{aligned}$$
(28)

It is not difficult to prove that the Nash equilibrium (which coincides with the Stackelberg equilibrium of the corresponding static game) is given by:

$$\begin{aligned} u^1_k&= \frac{\left( 1+\theta _2\right) ^2+2\theta _1}{\left( 1+\theta _2\right) ^2+\theta _1}, ~~~~ k=1,3,5\dots \end{aligned}$$
(29)
$$\begin{aligned} u^2_k&= \frac{2+\theta _2x_k}{1+\theta _2 }, ~~~~ ~~~~~~~~~~ ~k=2,4,6 \dots \end{aligned}$$
(30)

To prove this fact, observe that (30) represents a best response for Player 2 at time step \(k=2\nu \), given the action of Player 1 at time step \(2\nu -1\), where \(x_k=u^1_{k-1}\). Then, substituting (30) into (26) and minimizing for \(x_k=u^1_{k-1}\), we get (29).

A simple dynamic rule that leads to the Nash equilibrium is given by:

$$\begin{aligned} u^1_1= & {} 1, \end{aligned}$$
(31)
$$\begin{aligned} u^1_k= & {} \frac{\left( 1+\hat{\theta }_1\right) ^2+2\theta _1}{\left( 1+\hat{\theta }_1\right) ^2+\theta _1}, ~~~~ k=3,5,7\dots \end{aligned}$$
(32)
$$\begin{aligned} u^2_k= & {} \frac{2+\theta _2x_k}{1+\theta _2 }, ~~~~ ~~~~~~~~~~ ~k=2,4,6 \dots \end{aligned}$$
(33)

where \(\hat{\theta }_1\) is the least squares estimate of \(\theta _2\) based on (30). To see that (31)–(33) converge to the Nash equilibrium in a finite number of steps, first observe that at time step \(k=3\), Player 1 will know the value of \(\theta _2\) and that for \(\hat{\theta }_1=\theta _2\) (32),(33) coincide with (29), (30).

A very simple manipulation rule for Player 2 is to use (33) with \(\theta ^{2,pr}_k =(\varepsilon -1)\) in place of \(\theta _2\), where \(\varepsilon \) is a small positive constant. Using these rules, after two time steps, the actions of the players will converge to:

$$\begin{aligned} u^1= & {} \frac{\varepsilon ^2+2\theta _1}{\varepsilon ^2+\theta _1},\end{aligned}$$
(34)
$$\begin{aligned} u^2= & {} \frac{\varepsilon ^2+\varepsilon +2\theta _1}{\varepsilon ^2+\theta _1}. \end{aligned}$$
(35)

As \(\varepsilon \) decreases and approaches 0, actions of both of players approach the value 2. In this case, the cost of Player 2 approaches 0.

Remark 4

The Stackelberg leader (Player 1) behaves approximately as the follower wants her to behave. The situation is quite similar to an inverse Stackelberg game with Player 2 as the leader [24]. Thus, Player 2 can achieve a very strong form of leadership and in this sense, we may conclude that: “At least in some cases, regarding leadership, the best use of the information available is much more important than the commitment or the timing of the game.”

Remark 5

Player 2 can achieve approximately the same cost, as if she was acting on Player 1’ behalf. An interesting question arises: “In which sense does this outcome express the ‘free will’ of Player 1”?

4.3 Many Player Symmetric Games

We then move to a slightly more general case, where there are N symmetrically interacting players. The instantaneous costs have the form:

$$\begin{aligned} L_i = (u^i-\theta _i)^2+\left( u^i-\frac{1}{N-1}\sum _{j\ne i} u^j\right) ^2. \end{aligned}$$
(36)

The corresponding full information game has a unique feedback Nash equilibrium. It is not difficult to show that the equilibrium is given by:

$$\begin{aligned} u^i=\frac{N\theta _i+\sum _{j\ne i} \theta _j}{2N-1}. \end{aligned}$$
(37)

We then consider the pretenders’ game. Using \(\theta ^{i,pr}\) in the place of \(\theta ^{i}\) in (37) and substituting back in (36) we get the costs in the corresponding pretenders’ game:

$$\begin{aligned} \bar{J}_i&= \frac{1}{(2N-1)^2}\left[ \left( N\theta ^{i,pr} +\sum _{j\ne i} \theta ^{j,pr}-(2N-1)\theta ^i \right) ^2 \right. \nonumber \\&\quad \left. + \left( (N-1)\theta ^{i,pr}-\sum _{j\ne i} \theta ^{j,pr} \right) ^2 \right] . \end{aligned}$$
(38)

Considering the partial derivative of \(\bar{J}_i \) with respect to \(\theta ^{i,pr} \) and solving for \(\theta ^{i,pr} \) we obtain the following characterization of the equilibrium of the pretenders’ game:

$$\begin{aligned} \theta ^{i,pr}=\frac{2N^2-N}{2N^2-2N+1}\theta ^i-\frac{1}{2N^2-2N+1}\sum _{j\ne i} \theta ^{j,pr}. \end{aligned}$$
(39)

Hence, the equilibrium of the pretenders’ game is given by:

$$\begin{aligned} \theta ^{i,pr}=\left( 1+\frac{1}{2N} \right) \theta ^i -\frac{1}{2N(N-1)}\sum _{j\ne i} \theta ^j. \end{aligned}$$
(40)

Remark 6

Equation (40) shows that when the interactions are symmetric, the players tend to pretend less as the number of the players increases. Furthermore, the value of pretending decreases and the Nash pretenders’ outcome approaches the Nash equilibrium, as the number of players increases.

4.4 Many Player Asymmetric Games

We then consider a quadratic game with N players having asymmetric interactions. The instantaneous cost of each player is given by:

$$\begin{aligned} L_i = \left( u^i-\theta _i\right) ^2+\left( u^i-\sum _{j=1}^N a_{ij}u^j \right) ^2, \end{aligned}$$
(41)

where \(a_{ij}\) represents the influence of the actions of Player j to Player i or, in the opinion game setting, how much Player i is interested in Player j’s opinion. We assume that \(\sum _{j=1}^Na_{ij}=1\) and that \(a_{ii}=0\), for any \(i=1,\dots ,N\). The matrix \(A=[a_{ij}]\) will be used. We further assume that \(\theta _i\in [-M,M]\).

The Nash equilibrium is characterized by:

$$\begin{aligned} u=\frac{1}{2}\theta +\frac{1}{2}Au, \end{aligned}$$
(42)

where \(\theta =[\theta _1,\dots ,\theta _N]^T\) and \(u=[u^1,\dots ,u^N]^T\). There exists a unique Nash equilibrium:

$$\begin{aligned} u=\frac{1}{2}(I-\frac{1}{2}A)^{-1} \theta =\frac{1}{2}A^+\theta , \end{aligned}$$
(43)

where \(A^+=I+\frac{1}{2}A+\frac{1}{4}A^2+\dots \). The series in \(A^+\) converges due to the fact that A is a stochastic matrix.

For each Player i, let us introduce the following quantity:

$$\begin{aligned} C_i = e_i^T\left( \sum _{t=2}^\infty A^t/2^t\right) e_i. \end{aligned}$$
(44)

We call \(C_i\) the “centrality of Player i.” The optimal pretending of each player will be expressed in terms of \(C_i\).

Remark 7

The quantity \(C_i\) has some similarities with the Bonacich centrality measures [9] and the PageRank index [30]. The centrality \(C_i\) may have the following representation: Assume that the matrix A stands for the transition probabilities of a Markov chain. Consider a random walk on a graph, each node of which represents a player. At each time step, the random walk with probability 1 / 2 stops and with probability 1 / 2 jumps probabilistically to a new node according to A. Then, \(C_i\) is the expected number of times that a random walk started at the node of Player i will visit that node again. \(\square \)

Let us focus on a certain Player i. Consider a profile of (real or pretended) types \(\tilde{\theta }=[\tilde{\theta }_1,\dots ,\tilde{\theta }_{i-1},\theta _i,\tilde{\theta }_{i+1},\dots ,\tilde{\theta }_N]^T\). The pretended type of Player i can be written as \(\theta ^{i,pr}=\theta _i+\delta ^i\). The cost of Player i in the pretenders’ game is given by:

$$\begin{aligned} \bar{J}_i= & {} \left( \frac{1}{2}e_i^T A^+\tilde{\theta }-\theta _i+ \frac{1}{2}e_i^T A^+e_i\delta ^i \right) ^2 \\&+\,\left( \frac{1}{2}e_i^T A^+\tilde{\theta }+ \frac{1}{2}e_i^T A^+e_i\delta ^i -\frac{1}{2}e_i^T AA^+\tilde{\theta }-\frac{1}{2}e_i^T AA^+e_i\delta ^i \right) ^2. \end{aligned}$$

Observing that \(e_i^TA^+e_i=1+C_i\), that \(e_i^TAA^+e_i=2C_i\) and that

$$\begin{aligned} \alpha = \frac{1}{2}e_i^T A^+\tilde{\theta }-\theta _i =-\left[ \frac{1}{2}e_i^T A^+\tilde{\theta }-\frac{1}{2}e_i^T AA^+\tilde{\theta }\right] , \end{aligned}$$

the cost is written as:

$$\begin{aligned} \tilde{J}_i = \left( \frac{1+C_i}{2}\delta ^i+\alpha \right) ^2 +\left( \frac{1-C_i}{2}\delta ^i-\alpha \right) ^2. \end{aligned}$$
(45)

Hence, the optimal pretending is given by:

$$\begin{aligned} \delta ^{^i,\star } = \frac{2C_i}{1+C_i^2} \alpha =\frac{2C_i}{1+C_i^2} \left[ \frac{1}{2}e_i^T A^+\tilde{\theta }-\theta _i\right] . \end{aligned}$$
(46)

Proposition 2

There exists an equilibrium in the pretenders’ game.

Proof

Consider the set \(B=[-2M,2M]^N\). The following claim will be used: Claim: If \(\tilde{\theta }\in B\), then it holds \(\theta _i+\delta ^{i,\star }\in [-2M,2M]\), for each player i. \(\square \)

To prove the claim, observe that \(C_i\in [0,1]\) and thus \(2C_i/(1+C_i^2)\in [0,1]\). Furthermore, \(\frac{1}{2}e_i^TA^+\tilde{\theta }\in [-2M,2M]\). Hence,

$$\begin{aligned} \theta _i+\delta ^{^i,\star } = \left( \frac{2C_i}{1+C_i^2}\right) \frac{1}{2}e_i^T A^+\tilde{\theta }+\left( 1-\frac{2C_i}{1+C_i^2}\right) \theta _i. \end{aligned}$$

The fact that \(\theta _i+\delta ^{^i,\star }\) is a convex combination of two quantities in \([-2M,2M]\) completes the proof of the claim.

The claim shows that the best response maps B into B. Furthermore, B is compact and convex and the map is continuous. Thus, Brouwer fixed point theorem applies and the pretenders’ game has an equilibrium. \(\square \)

Equation (46) shows that the pretending of each player depends on its centrality \(C_i\). The plot of the function \(f(C_i)=2C_i/(1+C_i^2)\) is shown in Fig. 2. Furthermore, \(\alpha \) is bounded: \(|\alpha |\le 2M\). Thus, a player with low centrality tends to pretend less and the value of pretending is small.

Fig. 2
figure 2

The plot of the function \(f(C_i)=\frac{2C_i}{1+C_i^2}\)

In the following example, a game involving an influential player and many less influential players is analyzed.

Example 1

Consider a game with \(N+1\) players. The instantaneous cost for Player 0 is given by:

$$\begin{aligned} L_0 = (u^0-\theta _0)^2+\left( u^0-\frac{1}{N}\sum _{j=1}^N u^j \right) ^2. \end{aligned}$$
(47)

The instantaneous cost for each one of the players \(i=1,\dots ,N\) is given by:

$$\begin{aligned} L_i = (u^i-\theta _i)^2+\left( u^i-au^0-\frac{1-a}{N-1}\sum _{j\not \in \{0,i\}} u^j \right) ^2. \end{aligned}$$
(48)

The matrix A has the form:

$$\begin{aligned} A=\left[ \begin{matrix} 0&{} \frac{1}{N}\varvec{1}^T\\ a\varvec{1}&{} \frac{1-a}{N-1}(\varvec{1}\varvec{1}^T-I) \end{matrix}\right] . \end{aligned}$$

Figure 3 illustrates the centrality \(C_0\) of the influential player 0 and the centrality \(C_i\) of any other player i, as the number of players increases. The centrality computations were made for \(a=0.5\).

Fig. 3
figure 3

The centrality of the influential player \(C_0\) and the centrality \(C_i\) of another player in Example 1

As the number of players increases, only the influential player has a non-negligible motivation to pretend. Furthermore, the Nash pretenders’ outcome approaches the Stackelberg equilibrium with the influential player as the leader. \(\square \)

Remark 8

In Example 1, the players do not have differences in timing, commitment or the processing of the available information. However, the Nash pretenders’ outcome approaches the Stackelberg equilibrium as the number of players increases. This Stackelberg leadership appears intrinsically, without assuming different roles for the players a priori. Thus, pretending may serve as a natural Stackelberg leadership explanation.

Another way to understand, why when there is only one influential player she may acquire a leader role, is using the centrality measure \(C_i\). A high value for \(C_i\) means that there are some players who are affected by Player i’s pretended type, while at the same time Player i is interested in the actions of those same players. A small player is less able to affect through pretending the actions of the players which in turn affect her.

5 An LQ Non-Cooperative Pollution Control Game

In this section, we consider a very simple linear quadratic pollution control game with two players. The full information, continuous-time analogue of this model was analyzed in [11]. Each Player \(i=1,2\) stands for a country. The dynamics is given by:

$$\begin{aligned} x_{k+1}=ax_k+u_k^1+u_k^2. \end{aligned}$$
(49)

The state variable \(x_k\) represents the pollution stock, and the actions of the players \(u_k^1\) and \(u_k^2\) stand for the emissions of the two countries. The pollution stock of the environment decreases naturally with a rate \(0<a<1\).

The cost functions of the players are given by:

$$\begin{aligned} J_i = \sum _{k=0}^\infty \rho _i^k\left[ qx_k^2+\left( u_k^i\right) ^2-u_k^i \right] . \end{aligned}$$
(50)

Cost (50) comprises the effects of two factors. The economic part is expressed by the difference \((u_k^i)^2-u_k^i\), whereas the term \(qx_k^2\) corresponds to the environmental cost.

We assume that both players care the same about pollution, i.e., the parameter \(q>0\) is common among the players and that they have the same economic loss from reducing their emissions. However, they may have a different discount factor \(\rho _i\in [0.9,0.99]\). We assume that the players know the parameters q and a and each player knows its own discount factor. We assume, however, that each player is not aware of the discount factor of the other player.

Consider a pair of strategies \(u_k^1 = K_1^1x_k+K_2^1\), \(u_k^2 = K_1^2x_k+K_2^2\) constituting a Nash equilibrium of the corresponding full information game. Then, this pair is characterized by:

$$\begin{aligned} p^i= & {} q+\frac{\rho _i p^i}{1+\rho _i p^i}(\bar{a}^i)^2,\end{aligned}$$
(51)
$$\begin{aligned} c_1^i= & {} \frac{\rho _i\bar{a}^i\left( 2 p^iv^i+ c_1^i\right) }{1+\rho _i p^i}, \end{aligned}$$
(52)
$$\begin{aligned} K_1^i = -\frac{\rho _i \bar{a}^i p^i}{1+\rho _i p^i},\end{aligned}$$
(53)
$$\begin{aligned} K_2^i=\frac{1}{2}-\rho _i\frac{2p^iv^i+c_1^i}{2(1+\rho _i p^i)}, \end{aligned}$$
(54)

where \(\bar{a}^i = a+K_1^{-i}\) and \(v^i=K_2^{-i}+1/2\) and \(p^1, p^2>0\). The details of the optimal control problems are given in “Appendix sections A and B.”

Proposition 3

If \(0<a<1\), there exists a solution of (51)–(54) with \(p^1, p^2>0\). Thus, there exists a Nash equilibrium.

Proof

See “Appendix C.” \(\square \)

Assuming that each player is not aware of the discount factor of the other player, a simple adaptive rule, based on an estimation \(\hat{\rho }_{-i}\) of the opponent’s discount factor, may be used. The rule for Player i is described in Algorithm 1.

figure a

Remark 9

Step 2 of Algorithm 1 (also Step 2 of Algorithm 2) estimates the discount factor of the opponent based only on her last action. Of course several alternatives are available, but (55) is probably the simplest one. \(\square \)

A simple pretending strategy for Player i is given in Algorithm 2. In Algorithm 2, Player i uses the estimated discount factor of her opponent in order to pretend optimally.

figure b

Example 2

In this example we consider two players with discount factors 0.95. Furthermore, we assume that the parameters have values \(q=0.1 \) and \(a=0.8\).

At first, no player is pretending. After time step 20, Player 1 pretends applying Algorithm 2 and after time step 40 both players apply Algorithm 2.Footnote 1 For comparison, the trajectories of the players’ actions and state variable for the Nash equilibrium of the full information game are also presented. In Fig. 4 the actions of the players are presented. Fig. 5 shows the evolution of the pollution stock and Fig. 6 presents the evolution of the instantaneous cost for both players.

Remark 10

The numerical simulations show that the application of the dynamic rule described in Algorithm 1 has very similar results to the Nash equilibrium of the corresponding full information game. In the case of pretending, the manipulating player has the motivation to pretend that she has a lower discount factor (i.e., she cares less about the future) and relies to the other player to reduce its emissions. If one or both players pretend, the pollution stock increases. The non-pretending player needs to reduce its emissions, and there is also a larger pollution. Thus, pretending in this example is not a cooperative action.

Fig. 4
figure 4

The evolution of players’ actions in Example 2. In the first period of time, in which no one pretends the emissions of the players are identical and their value is very close to the full information Nash equilibrium. After time step 20, Player 1 pretends to be shortsighted and her pollution emissions increase. This fact makes Player 2 to reduce her emissions. After time step 40, both players pretend to be shortsighted. Now, Player 2 increases her emissions and makes Player 1 to decrease them. In this case, the total emissions are larger than the case where no one pretends

Fig. 5
figure 5

The evolution of the pollution stock in Example 2. The pollution stock increases after time step 20 and increases further after time step 40, where both players pretend to have a lower discount factor. In both cases pollution stock increases, despite the fact that only one player is increasing her emissions, while the other is decreasing

Fig. 6
figure 6

The instantaneous costs of the players in Example 2. In the first part of the simulation, where both the players follow their adaptive rules with no pretending, the instantaneous costs are very close to their Nash values. After time step 20, Player 1 attains a lower cost, but Player 2 is bound to emit less and also have a more polluted environment. Thus, the instantaneous cost of Player 2 is larger than its Nash value and this difference is larger than the improvement for Player 1’s cost. After time step 40, the instantaneous cost is improved for Player 2, but remains worse than the Nash cost for both the players

6 Electricity Market Models

In this section two competitive market models are analyzed. The first considers a linear supply function duopoly model. A simple pretending algorithm is proposed, and the numerical simulation results show that pretending is a cooperative action, i.e., it increases the firms’ profits, while the price also increases. The second example considers a mechanism from the literature [34]. It is shown numerically that the attempt of the players to manipulate may make the mechanism not working at all.

6.1 Linear Supply Function Model

In this subsection, we assume that there are two competing production firms. Each one of them proposes a linear supply function [6] that is at each price level produces an energy quantity proportional to that price. We also consider linear demand of the form \(q=A-Bp\), where q is the total energy demand and p the price. The instantaneous cost of each player is given by:

$$\begin{aligned} L_i = c_iq_i^2-pq_i, \end{aligned}$$
(57)

where \(q_i\) is the amount of energy produced by Player i, the quantity \(c_iq_i^2\) represents the production cost and \(pq_i\) the revenues for Player i.

Each player chooses the constant \(u_i\) of the linear supply function, that is the quantity produced depends on the price according to the linear relation \(q_i=u_ip\). We assume that each player knows her own production cost but not the production cost of her opponent.

Simple manipulations show that the price and the instantaneous cost of the players depend on \(u_1,u_2\) according to:

$$\begin{aligned} p=\frac{A}{u_1+u_2+B}, \end{aligned}$$
(58)

and

$$\begin{aligned} L_i=A^2\frac{c_iu_i^2-u_i}{(u_1+u_2+B)^2}. \end{aligned}$$
(59)

The best response maps are given by:

$$\begin{aligned} u_1^+= & {} \frac{B+u_2}{2c_1u_2+2c_1B+1}, \end{aligned}$$
(60)
$$\begin{aligned} u_2^+= & {} \frac{B+u_1}{2c_2u_1+2c_2B+1}. \end{aligned}$$
(61)

It is easy to see that the best response map is contractive with respect to the infinity norm. Thus, if the players use repeatedly (60), (61) the system will approach the Nash equilibrium of the static full information game.

A simple pretending strategy for Player 1 is described in Algorithm 3.

Fig. 7
figure 7

The pretended production costs \(c^{1,pr}, c^{2,pr}\) in Example 3. Player 1 implements Algorithm 3 from the beginning, but the steps 6 and 7 are used for the first time at \(k=78\). Both of the players have an interest to pretend that they have a larger production cost compared to their actual. After time 200, where Player 2 is using Algorithm 3 and moves to her pretending strategy, the motivation for Player 1 to pretend is slightly reduced and thus the next time she reviews her pretended cost, i.e., at time step \(k=253\), she moves slightly toward her real cost

figure c

A similar pretending algorithm may be used also by Player 2.

Example 3

Assume that \(A=1\), \(B=0.5\), \(c_1=1\), \(c_2=1.2\). Player 1 implements Algorithm 3, from the time step \(k=1\), and Player 2 uses the best response until time step \(k=200\) and Algorithm 3 afterward. For both players, we assume that Algorithm 3 is implemented with \(\varepsilon =0.01\). Figure 7 shows the evolution of pretended production costs, Fig. 8 presents the evolution of prices, and Fig. 9 shows the evolution of the player instantaneous profits (negative total costs).

Fig. 8
figure 8

The time evolution of the price in Example 3. Price increases, compared to the Nash equilibrium price as players pretend to have a higher cost. The price is slightly reduced at time step \(k=253\), following Player 2’s pretended cost reduction

Fig. 9
figure 9

The time evolution the profits (\(-L_i\)) in Example 3. The profits of both players compared to the Nash equilibrium increase. At the first adjustment of pretended cost of Player 1, she increases her profits after two time steps, while the profits for Player 2 increase immediately. The profits of Player 1 have this transient response, due to the fact that steps 6 and 7 of Algorithm 3 make computations for the corresponding equilibrium and not the dynamic response. The interesting feature of this figure is that the non-pretending player benefits more from pretending than the pretender. Thus, pretending in this case is a cooperative action

Remark 11

In this example it happens that the Nash pretenders’ outcome induces lower costs (higher value) for both players, compared to the Nash equilibrium of the corresponding static full information game. Furthermore, each player has a motivation to pretend that she has a higher production cost, in order to drive the prices higher. This in turn benefits the other player, for two different reasons. First, she is going to acquire a larger market share and second she is going to sell this larger amount of energy at a higher price. Thus, pretending is a cooperative action, i.e., each player prefers the other player to pretend than not to pretend. \(\square \)

6.2 The Rasouli–Teneketzis Mechanism

In this subsection we analyze an example of an electricity market having many competing production firms. At first, we briefly review a market mechanism described in [34]. An algorithm to approximate the Nash equilibrium is then proposed and tested numerically. The ability of the participants to manipulate the rules of their opponents is then studied.

There are N energy producers \(P_1,\dots ,P_N\). Each one of them produces a quantity of energy \(q^i\) and proposes a price \(p^i\) that is the action of a player is given by \(u^i = [q^i~p^i]^T\). Let us denote the total demand by \(D_0>0\). The production cost for Player \(P_i\) is given by \(C_i(q^i)=c_i(q^i)^2\). The instantaneous cost of each player is given by:

$$\begin{aligned} L_i(q^i,t^i) = C_i(q^i)-t^i, \end{aligned}$$
(62)

where \(t^i\) is the payment to the producer i made by the system operator. The payment \(t^i\) is given by:

$$\begin{aligned} t^i=p^{i+1}q^i-(p^i-p^{i+1})^2-2p^i\zeta ^2, \end{aligned}$$
(63)

where \(p^{N+1}\equiv p^1\) and

$$\begin{aligned} \zeta = \sum _{i=1}^N q^i-D_0, \end{aligned}$$
(64)

is the power surplus (or deficit if it is negative).

[34] studies the non-trivial Nash equilibria of the game, i.e., the Nash equilibria such that it does not hold \(q^1=\dots =q^N=0\). It is shown that in any such equilibrium the total production coincides with the demand, the proposed prices are identical, and the total energy is produced with the smallest total cost (the sum of \(C_i\) is minimum). Furthermore, a non-trivial Nash equilibrium always exists. In this sense, the non-trivial Nash equilibria of the game produce “socially optimal” outcomes.

We then propose a very simple algorithm (Algorithm 4). This algorithm is found numerically to converge asymptotically to a non-trivial Nash equilibrium of the game.

figure d

Lemma 1

If Algorithm 4 converges, then the limit point is a non-trivial Nash equilibrium of the game.

Fig. 10
figure 10

The evolution of the prices in Example 4. Until approximately \(k=20\), the power produced does not meet the demand, and thus, the prices are increased according to (66). After \(k=20\), the prices are adjusted according to (67). After a transient phase, all the prices converge to the same limit

Fig. 11
figure 11

The evolution of the produced quantities in Example 4. Until approximately \(k=20\), due to the increasing prices the energy quantities following (65) are also increasing linearly. After \(k=20\) the quantities converge rather quickly to their equilibrium values

Proof

First, assume that the dynamics described by Algorithm 4 converges to a fixed point, \((p_i^{\infty }, q_i^\infty )_{ i=1}^N\). For this fixed point, it holds \(\zeta \ge 0\), due to the fact that if \(\zeta <0 \), we would have \(p_{k+1}^i = p_k^i+\delta _1\), for each i and \((p_i^{\infty }, q_i^\infty )_{ i=1}^N\) would not be a fixed point. Thus, using (65), we conclude that \(q_i^\infty \) is a best response to the actions of the others. Furthermore, the prices \(p_i^\infty \) are fixed if they correspond to a best response of the actions of the other players. Thus any fixed point of (65) and (67) corresponds to a Nash equilibrium of the game. Observe that the trivial Nash equilibrium is not a fixed point. Thus, any fixed point of the dynamics is a non-trivial Nash equilibrium. It remains to show that the limit of the dynamics is a fixed point. For the proof see “Appendix E.” \(\square \)

Example 4

In this example there are four production firms with production costs \(c_1=0.5, c_2=2, c_3=3, c_4= 10\). Figures 10 and 11 show the evolution of the energy production and the proposed prices assuming that the players follow Algorithm 4. \(\square \)

Fig. 12
figure 12

The evolution of the prices in Example 5. Only Player 1 is pretending. Only Player 1 is pretending. The pretending of a single player is sufficient to increase the prices for all the players. Thus, the pretending of a player is beneficial to the others

A very simple manipulating strategy is then described. A manipulating Player i implements Algorithm 4 as if she had a different (not real) production cost \(c^{i,pr}\). The pretended type is updated rarely but periodically according to the rule:

$$\begin{aligned} c^{i,pr}(k+1)= & {} c^{i,pr}(k), \text {~~if~} k+1\ne tT, \nonumber \\ c^{i,pr}((t+1)T)= & {} c^{i,pr}(tT)-\,\rho \mathrm {~sat}_{[-M,M]} \nonumber \\&\left( \frac{L_i(tT+T-1)-L_i(tT-1)}{c^{i,pr}(tT+T-1)-c^{i,pr}(tT-1) } \right) , \end{aligned}$$
(68)

where \(\rho \) is a positive constant, T is the time interval, and \(sat_{[-M,M]}(\cdot )\) is the saturation function:

$$\begin{aligned} \mathrm {sat}_{[-M,M]}(z) = {\left\{ \begin{array}{ll} M &{} \text{ if } z>M \\ z,~ &{} \text{ if } |z|\le M \\ -M &{} \text{ if } z<-M \end{array}\right. } \end{aligned}$$

Remark 12

Pretending rule (68) does not estimate the production cost of the other players. Instead, it uses a gradient decent like rule to minimize the cost with respect to the pretended cost. A learning scheme very much related to (68) is directional learning [29]. Particularly, in [29] a player increases the probability of contributing to a game of public goods, if a previous round switch from non-contributing to contributing increased her payoff or a switch from contributing to non-contributing reduced her payoff. Similarly, a player reduces her probability of contributing if a previous round switch from contributing to non-contributing increased her payoff or a switch from non-contributing to contributing reduced her payoff.   \(\square \)

Fig. 13
figure 13

The evolution of energy production in Example 5. Only Player 1 is pretending. We may observe that the market share of Player 1 is reduced. However, the prices are increased and the production cost is reduced. This makes the pretending player slightly better off. From the point of view of the other players, both the prices and the market share are increasing. Thus, pretending of Player 1 makes them significantly better off

Example 5

In this example there are eight production firms. We assume that only Player 1 (having the smaller production cost) is pretending implementing (68). Figure 12 illustrates the evolution of the prices, and Fig. 13 the evolution of production quantities. It turns out that the prices proposed by the several players converge to identical values. Furthermore, the pretending of Player 1 reduces her market share but increases the prices. Thus, pretending is slightly beneficial for Player 1, whereas it is more beneficial for the other players which sell energy at higher prices and also get larger market shares. Thus, in this case pretending is a cooperative behavior. \(\square \)

Example 6

This example continues Example 5 assuming that all the players are pretending. Figure  14 shows the evolution of the prices, and Fig. 15 the evolution of quantities. In this example, price grows unbounded. The reason is that the optimal pretended production cost for each player increases when the pretended costs of the others increase. This dynamics leads to price divergence. Therefore, the mechanism in this case is not working at all. \(\square \)

Fig. 14
figure 14

The evolution of the prices in Example 6. All the players are pretending. Each player has a motivation to pretend that she has a production cost larger than her actual. Furthermore, pretending of the other players generates the incentive for a player to pretend that she has an even larger production cost. This dynamics leads to a price divergence

Fig. 15
figure 15

The evolution of the energy production in Example 6. All the players are pretending. As the prices increase, the differences in actual production cost become less important and the production quantities approach one another

Remark 13

The mechanism was designed in order to have a “socially optimal” behavior in the Nash equilibrium. However, each one of the players has a motivation to pretend. If all the players pretend, the system is not working at all, as illustrated in the last example where the prices diverge. \(\square \)

7 Conclusion

We studied the possibility of manipulation in game situations where the players are using a dynamic rule to determine their future actions. Some alternative outcomes were proposed, and the results were applied in opinion quadratic games, a linear quadratic environmental game and two electricity market models.

Several relationships among pretending and leadership were identified in the repeated quadratic games examples. Particularly, in the two-player simultaneous move repeated game the pretender has the same cost as if she were Stackelberg leader, in the repeated Stackelberg game if the follower pretends she may achieve the same cost as if she were the leader in the corresponding inverse Stackelberg game and in the case where there is an influential player and many players of low influence the Nash pretenders’ outcome approaches the Stackelberg equilibrium as the number of players increases.

In the environmental game example pretending leads to a less cooperative behavior and more pollution. In the electricity market games, pretending may enhance the cooperation among the players, increase the prices or even make the system not working at all.

The alternative outcomes proposed are by no means the only reasonable ones, and further work is needed. For example, how would a player react knowing that her opponent tries to manipulate her? Another direction of future research is the generalization of the results about centrality to dynamic game situations. Furthermore, it would be interesting to study how to design the network of interactions such that the motivation to manipulate is not large.