Abstract
This work studies dynamic game situations with incomplete structural information, motivated by problems arising in electricity market modeling. Some adaptive/learning strategies are considered as an expression of the bounded rationality of the participants of the game. The adaptive strategies are typically not in Nash equilibrium. Thus, the possibility of manipulation appears. That is, a player may use the dynamic rule of the opponent in order to manipulate her. We focus on a smaller class of manipulating strategies, called pretending strategies, where each player acts as if she had different, not real, preferences. It turns out that under certain technical conditions, if only one player pretends, she can achieve the same cost as if she were the Stackelberg leader. The situation where all the players are pretending is then considered, and an auxiliary game, called pretenders’ game, is introduced. A class of quadratic games is then studied, and several relations among pretending and Stackelberg leadership are derived. A linear quadratic environmental game is also studied. We then study some competitive electricity market models. Particularly, a supply function model and the market mechanism described in Rasouli and Teneketzis (electricity pooling markets with strategic producers possessing asymmetric information ii: inelastic demand, arXiv: 1404.5539, 2014) are considered. It turns out that pretending may increase competition or cooperation and in some cases pretending may cause behaviors making the system not working at all.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In a number of actual interactive decision situations (game situations) there are a number of decision makers that interact strategically over time, but each one of them has only a partial knowledge of the intentions of the others. A particular example is an electricity market where several production firms are competing repeatedly over time and each firm knows its own costs but not the costs of the other firms [17, 42]. Such strategic interactions over time can be described by dynamic games with incomplete structural information.
Quite often it is very difficult to find a Nash equilibrium for dynamic or repeated games with incomplete information. Two difficult problems may arise; the first is due to the “Witsenhausen effect” [44], i.e., the current action of each player affects the future state or parameter estimation of the other players. The second is due to the “dual control effect” [13], i.e., that the current action of a player affects the quality of his own future parameter estimation. The latter difficulty makes the optimization problems involved very difficult. Even in the degenerate game case (one player game) the optimal control problem with unknown parameters has not been solved analytically, except only for a few special cases [45].
In this context, it has been proposed that the players (instead of trying to find the equilibrium strategies) use some simple deterministic or stochastic dynamic rules which determine the future actions of the players based on their past actions. The use of such rules, usually, reflects the bounded rationality [38, 41] of the participants of the game, i.e., their inability to solve very difficult problems. These rules can be models of learning, adaptation, evolution or imitation [16, 40, 47] ch. 7, [10, 27, 31]. Most of the research on dynamic rules focuses on convergence issues. However, a set of such strategies, typically, is not in Nash equilibrium.
A set of dynamic rules often converges to an equilibrium or a correlated equilibrium of a simplified game. For example, in repeated finite games no regret learning rules converge to a correlated equilibrium of the static full information game ([21]) and in the same class of games a set of uncoupled stochastic rules converges probabilistically to a Nash equilibrium of the corresponding static game [14]. Such results are often used to justify (or as an alibi for) the use of the equilibrium notions of the simplified games (for example, static full information) to the repeated or dynamic incomplete information game situation.
Are these adaptive/learning strategies a reasonable prediction for the evolution of the game? Particularly, if the players knew that they are going to implement those strategies, would they stick to them? When a player is implementing an adaptive control law, she may be viewed by another player as a system under control. That is, the other player may try to use the knowledge of the adaptation law of the first player to manipulate her. The topic of this work is to study phenomena occurring when the players are trying to manipulate one another and the implications that manipulation may have to the costs of the participants of the game.
1.1 Contribution
At first, a general form of adaptive/learning strategies is presented and an optimal control problem for strategic manipulation is stated. Then a smaller class of manipulating strategies is considered, called the “pretender’s” strategies. The manipulating player implements her learning/adaptation rule as if she had a different, not real type (preferences). Outcomes alternative to the Nash equilibrium are proposed for the cases in which one player pretends and all the players pretend. Under some technical conditions, if only one player pretends, she may achieve the same outcome as if she were a Stackelberg leader. We then study the case in which all the players are pretending, and some possible limit points are identified, using an auxiliary game called the “pretenders’ game.”
In order to gain some intuition, the notions described are applied to a very simple class of repeated quadratic opinion games, where analytical results are possible. At first, games with two players are analyzed and relationships of pretending and Stackelberg leadership are derived. It turns out that pretending enhances competition among the players. In the case of a repeated Stackelberg interaction, it turns out that if the Stackelberg follower is pretending, the follower can achieve approximately the same cost as if she were acting on the Stackelberg leader’s behalf.
The cases of repeated quadratic opinion games with many players having symmetric and asymmetric interactions are then analyzed. In the symmetric case, it turns out that when the number of players increases, the motivation to pretend decreases, the value of pretending decreases, and the outcome approaches the Nash equilibrium.
In the asymmetric influence case, the motivation of each player to pretend depends on a measure of the centrality of her position in the network of interactions. In the case where there is one influential player and many players each one of which has a rather small influence, only the influential player has a non-negligible motivation to pretend and the outcome approaches the Stackelberg equilibrium with the influential player as the leader. Thus, a “natural” Stackelberg leadership appears.
A linear quadratic game with two players is then analyzed. The game describes the interaction of two countries emitting pollutants. In this example, each player has a motivation to pretend to have a lower discount factor, i.e., to pretend that she cares less about the future. The outcome of the game when the players are pretending is to emit more and have a more polluted environment. In all the examples of quadratic and linear quadratic games analyzed, pretending is not a cooperative action.
Two competitive electricity market models are then studied. At first, a linear supply function model is considered. In this game situation, the players have a motivation to pretend to have a larger production cost in order to increase the price. Pretending in this model is a cooperative action, i.e., the pretending of the one player is beneficial to the other. The outcome when all the players pretend, is that the players produce less, the price is higher, and their profit increases, compared to the Nash case.
In the second electricity market model a mechanism from the literature is considered [34]. The mechanism is designed in order to work optimally in the Nash equilibrium. However, each one of the participants has a motivation to pretend. Furthermore, if all the participants pretend the prices diverge and the system is not working at all.
1.2 Related Literature
A related topic is the literature on reputation effects [15, 26, 28]. These models consider a long-run, patient player and many short-lived or not patient opponents. Assuming that there is a possibility (even with small probability) that the patient player has a “committed type” (or tough type), the patient player may gain in equilibrium a strategic advantage, similar to the Stackelberg leadership. A model of two-sided reputation was studied in [3], resembling a war of attrition situation. The basic difference of the current work with the theory of reputation games, is that in reputation games the roles of the players are asymmetric and a priori specified.
A very interesting related stream of research is the theory of indirect evolution ([12, 18, 19, 22, 23, 25]). These models assume rational players the preferences of which are affected by evolutionary forces. The evolution of preferences may explain phenomena such as altruism, spite, overconfidence, fairness, and reciprocity ([23]). In [1] it is shown that if both the preferences (or the perceptions) and the actions are evolving, the players would behave in an effectively rational way. The basic differences of the current work from the theory of indirect evolution are that the current work studies dynamic games, as well as repeated, and does not assume a full rationality for the participants. Another related research area is the theory of coevolutionary games [33, 43]. In coevolutionary games in addition to the players’ strategies, other properties such as the structure of the network describing the interactions or the players payoffs evolve, leading to different outcomes from the evolutionary games. An interesting class of telecommunication games is studied in [2, 4, 5]. In these examples, the players are better of, if instead of optimizing against their own preferences, they have an amount of altruism, i.e., optimize against a weighted sum of the payoffs of all the players. This effect is called “Paradox in Cooperation.”
A closely related topic is the theory of strategic delegation [39]. This theory studies situations where the acting party delegates its decision to another party having different interests.
Another related topic is the control of dynamic rules and strategic teaching. In [35], the optimal control of the best reply is studied and in [36] it is shown that in all the uncoupled learning rules the players have an incentive to deviate. Furthermore, in [37] it is shown that in a mixed population of myopic best response and mimic players, the imitators are better off. The difference of the current work with the theory of strategic teaching is that first it does not assign different roles to the players a priori and second that it suggests some alternative outcomes for the game.
In this work, we study the relationship among pretending and Stackelberg leadership. Endogenous models for Stackelberg leadership were studied in [7, 20].
1.3 Organization
The rest of this paper is organized as follows: In Sect. 2, the game and learning models are described. The pretenders’ strategies are studied in Sect. 3. The opinion quadratic games are studied in Sect. 4. Section 5 studies the LQ environmental game. Section 6 studies some electricity market games. Particularly, Sect. 6.1 studies a supply function model and Sect. 6.2 a mechanism presented in [34]. Finally, Sect. 7 summarizes the main contributions and proposes some future work.
2 Game Description and Adaptive Strategies
Let us start describing an N-player game. The players have types \(\theta _1,\dots , \theta _N\). Each player knows her own type \(\theta _i\), and \(\theta _1,\dots , \theta _N\) are part of a random vector \(\bar{\theta } = [\theta _1 ~ \dots ~ \theta _N ~ ~\theta ]^T \in \bar{\varTheta } = \varTheta _1\times \dots \times \varTheta _N\times \varTheta \) which follows a commonly known distribution.
The dynamics has the form:
where \(u_k^1,\dots , u_k^N\) are the action variables of the players and \(w_k\) is a random disturbance with a commonly known distribution.
The cost functions have the form:
where \(\rho _{\theta _i}\in (0,1]\) is a discount parameter and T can have a finite or an infinite value. An alternative formula for the costs is:
Equations (1), (2) or (1), (3) can describe dynamic games as well as repeated static games.
Each player receives at each time step an information vector according to one of the following information structures:
Information Structure 1:
Information Structure 2:
where the \(u^{-i}\) stands for \((u^{1},\dots ,u^{i-1},u^{i+1},\dots ,u^{N})\). The information that each player possesses at time step k has the form \(I^i_k = \left( \bar{I}^{i,new}_{0},\dots , \bar{I}^{i,new}_{k}\right) \).
The strategies of the players are given by \(s^i = (\gamma ^i_1,\gamma ^i_2,\dots )\), \(i=1,\dots ,N\), where \(\gamma ^i_k\) is a function having the form:
We will focus on “state feedback” strategies, where all the previous information is used only for “adaptation.” Specifically,
where \( \hat{\theta }^i_k\) is the adapted parameter of Player i. We assume that the adapted parameter evolves according to a dynamic equation:
For infinite horizon games, the following property, which holds true for several learning rules, is quite interesting (see for example [46]):
Property 1: The adapted parameters \(\hat{\theta }_k^1,\dots , \hat{\theta }_k^N\) converge to some limits \(\hat{\theta }_\infty ^1,\dots , \hat{\theta }_\infty ^N\) such that the feedback (no memory) strategies \(\gamma _k^i(x_k,\theta _i, \hat{\theta }_\infty ^i), ~i=1,2\) constitute a strongly time consistent Nash equilibrium ([8]) for the corresponding complete information game.
Most of the research in adaptive/learning rules focuses on convergence issues, and several converging learning rules have been proposed. Consider such a set of adaptive/learning rules in form (6), (7), and focus on Player i’s perspective. Player i faces a stochastic control problem with dynamics having state variable \((x_k,\hat{\theta }_k^{-i})\) and incomplete state observation \(y_k=x_k\). Furthermore, she has only a probabilistic description of the unknown parameters \(\theta ,\theta _{-i}\). Thus, Player i can use the information available in order to try to manipulate the overall system. However, the control problem described is, in general, very difficult. In the following section, we restrict ourselves to a simpler class of manipulating strategies.
3 Pretending Strategies
In this section we focus on a special class of manipulation strategies called the pretender’s strategies. Particularly, the manipulating player pretends to have a false type, probably in a response to the other players’ actions. For simplicity, we assume that the value of \(\theta \) is commonly known, i.e., \(\varTheta =\{\theta \}\). In what follows, only games with infinite horizon will be considered.
The general form of a pretender’s strategy that corresponds to (6), (7) is:
where the pretended type \(\theta ^{i,pr}_k\) is given as an output of a system:
Equations (8 9)–(11) represent a manipulating player who pretends to have a type that depends on her real type and a new, probably augmented, adapted parameter \(z^i_k\). That is, in order to pretend adequately, it is probably useful to accumulate more information.
In what follows, we consider the stationary cases where one, some or all the players are pretending. These stationary outcomes are possible limit points of learning/adaptation rules, when the players are pretending.
3.1 Optimal Stationary Pretending
We consider the possible limit points of the pretending strategies, assuming games with infinite horizon and long-run average cost. We assume that only Player \(i_0\) is pretending. In the spirit of Property 1, we analyze the following situation. Player \(i_0\) has discovered all the useful information for \(\theta _{-i_0}\), and the pretended type of Player \(i_0\) has converged to \(\theta _\infty ^{i_0,pr}\). The rest of the players (\(-i_0\)) react to a player of type \(\theta _\infty ^{i_0,pr}\). Furthermore, the set of strategies \(\gamma ^{i_0}(x_k,\theta ^{i_0,pr}_\infty , \hat{\theta }^{i_0}_\infty ), \gamma ^{j}(x_k,\theta _j, \hat{\theta }^j_\infty ) \), \(j\ne i_0\) constitute a strongly time consistent Nash equilibrium for the corresponding game with full information and types \(\theta ^{i_0,pr}_\infty ,\theta _{-i_0}\).
In order to define the optimal stationary pretending, the following assumption is made:
Assumption 1
For any set of types \(\theta _1,\dots , \theta _N\), there exists a unique a strongly time consistent Nash equilibrium of the corresponding full information game. Let us denote by \(\gamma ^{i,N}_{\theta _1,\dots ,\theta _N}(x_k),~ i=1,\dots ,N\) the set of strategies constituting the Nash equilibrium. Furthermore, fixing any subset of strategies, \(\gamma ^{i}\) for \(i\in I_0\subset \{1,\dots ,N\}\) there is a unique set of strategies \((\gamma ^{j})_{j\not \in I_0}\) such that each \(\gamma ^{j }\) is a best response (B.R.) to \(\gamma ^{1},\dots , \gamma ^{j -1}, \gamma ^{j +1}\dots , \gamma ^{N}\).\(\square \)
The optimal stationary pretending for Player \(i_0\) is given by:
It is interesting to compare the cost that the pretending player \(i_0\) attains with the cost of the full information game having Player \(i_0\) as Stackelberg leader. Let us first recall the Stackelberg equilibrium notion under Assumption 1 (see for example [8]).
Definition 1
Consider a set of strategies \((\gamma ^{{i_0},S}_{ \theta _{i_0},\theta _{-i_0}},\gamma ^{-{i_0},S}_{ \theta _{i_0},\theta _{i_0}})\) within the feedback (no memory) class of strategies. This set constitutes a Stackelberg equilibrium with Player \({i_0}\) as the leader if:
and for all \(j\ne i_0\), \(\gamma ^{j,S}_{ \theta _{i_0},\theta _{-i_0}}\) is a best response to \( \gamma ^{-j}\). \(\square \).
We will use the term Stackelberg games, for games in which one player called the Stackelberg leader announces her strategy and sticks to it, while the other player or players decide their actions taking into account leader’s announcements.
Proposition 1
If Player \(i_0\) pretends optimally, then:
An equality is attained is there exist a \(\tilde{\theta }_{i_0} \in \varTheta _{i_0}\) such that \(\gamma ^{{i_0},S}_{ \theta _{i_0},\theta _{-i_0}} = \gamma ^{{i_0},N}_{ \tilde{\theta }_{i_0},\theta _{-i_0}} \).
Proof
The strategy class \(\{\gamma ^{{i_0},N}_{\tilde{\theta }_{i_0},\theta _{-i_0}}: \tilde{\theta }_{i_0}\in \varTheta _{i_0}\}\) is a subclass of the feedback (no memory) strategy class. Furthermore, using Assumption 1, there is a unique set of strategies \((\gamma ^{{-i_0},N}_{\tilde{\theta }_{i_0},\theta _{-i_0}})\) such that the strategy of all the players except \({i_0}\) be a best response to the strategies of the others. Hence, the minimum in (13) is less than or equal to the minimum in (12). \(\square \)
Remark 1
Proposition 1 says that if there is enough uncertainty and only one player pretends, then this player attains the same cost, as if she were a Stackelberg leader. \(\square \)
3.2 The Pretenders’ Game
In this subsection, we study the case where two or more players are pretending. Under Assumption 1, an auxiliary game, called the pretenders’ game, can be defined.
Definition 2
For the game described by (1), (3), under Assumption 1, the corresponding pretenders’ game involves the same players, the action of Player i is \(\theta ^{i,pr}\in \varTheta _i\) and the cost is given by:
where \(\theta _i\) is the actual type. Let us also introduce the \(I_0\)-constrained pretenders’ game as the corresponding game where the actions of a set of players are constrained to coincide with the true value of the players’ types, i.e., \(\theta ^{i,pr}=\theta _{i}\), for \(i\in I_0\).
An interesting point is the Nash equilibrium of the pretenders’ game. This point can serve as an alternative prediction of the outcome of the original game.
Definition 3
For the game described by (1), (3), under Assumption 1, a Nash pretenders’ outcome is the set of strategies \((\gamma ^{1,N}_{\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty },\dots ,\gamma ^{N,N}_{\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty })\), where \((\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty )\) is an equilibrium of the corresponding pretenders’ game. The constrained Nash pretenders’ outcome is defined similarly assuming that \((\theta ^{1,pr}_\infty ,\dots ,\theta ^{N,pr}_\infty )\) is an equilibrium of the constrained pretenders’ game.
The Nash pretenders’ outcome is interesting as a possible limit point of a learning/adaptive algorithm, where each of the players pretends dynamically to have a false type.
Remark 2
The pretending outcomes described, i.e., the optimal stationary pretending for one player and the Nash pretenders’ outcome when all the players are pretending, are at least as reasonable as the application of the original dynamic rules, due to the fact that each player may have a motivation to pretend.
Let us note that there are also other works studying deviations from best response. For example, in smooth fictitious play, for repeated finite games (ex. Ch 4 of [16]), the players are slightly deviating from their best response toward a randomized policy. This is different from pretending policies, due to the fact that in the later case the deviations from best response are used to induce a desired behavior for the other players. \(\square \)
Remark 3
Let us comment on the relationship with the work in [4]. This work identifies an interesting phenomenon in routing games with partially altruistic players: In some cases “When a user increases its degree of cooperation while other users keep their degree of cooperation unchanged, leads to performance improvement of that user.” To be more precise, if \(J_i\) is the “selfish” cost of Player i, [4] considers the case where the players are partially altruistic, that is each player tries to minimize:
where \(a_i\) is the degree of cooperation. It is then shown numerically that, when the players apply strategies constituting a Nash equilibrium of the game described by (16), it is possible that \(J_i\) decreases, as \(a_i\) increases. This phenomenon is called the “Paradox in cooperation.” In this context the notion of the price of unilateral altruism was introduced [5].
Consider a situation where the degree of cooperation of Player 2 is commonly known, whereas the degree of cooperation of Player 1 is known only to Player 1 and there is a “Paradox in cooperation.” Then, Player 1 has an incentive to pretend to be more altruistic than she actually is. Furthermore, even if Player 2 knew the actual degree of cooperation of Player 1, she would have an incentive to accept that Player 1 is more altruistic; otherwise, she would induce a less altruistic behavior to Player 1 and thus hurt herself. Finally, Player 2 cannot distinguish between the case that Player 1 is altruistic and tries to reach a Nash equilibrium (of the corresponding full information game) and the case that Player 1 is not altruistic and pretends. \(\square \)
4 Quadratic Opinion Games
In this section, a simple class of repeated, quadratic games is analyzed. In this class of games, exact results can be obtained. We first analyze a two-player simultaneous move (Nash) repeated game and then a sequential move (Stackelberg) repeated game. We then move to symmetric and asymmetric games with many players. The results of this section help us to get some intuition on the relationships of pretending and strategic leadership. Throughout this section we assume long-time average costs.
4.1 A Two-Player Simultaneous Move Game
A two-player repeated opinion game in strategic form is studied. The cost function is given by (3) and the instantaneous costs by:
where \((\theta _1, \theta _2)\in \mathbb {R}^2\). We assume an information structure given by (5). The full information (static) game has a unique Nash equilibrium:
Several adaptive (iterative) techniques were studied in [32]. Probably, the simplest one is the best response map:
If both players follow their best response maps, their actions will converge to the Nash equilibrium of the full information static game.
Let us then study the case where Player 1 tries to manipulate Player 2 and Player 2 follows (20), (21). Due to the fact that the map \(\theta _1\mapsto u^{1,N}\) in (19) is onto, Proposition 1 applies. Thus, the feedback Stackelberg cost for Player 1 is feasible through pretending.
The optimal pretending for Player 1 is given by:
Player 1 can use several ways to learn \(\theta _2\) in order to implement her pretending policy. One way is to use only the last iteration. Particularly:
Figure 1 shows the action trajectories when no player, one player and both players are pretending. The parameters are \(\theta _1=1\) and \(\theta _2=-1.3\). If no player is pretending, then the dynamic rules lead to the Nash equilibrium. If a single player is pretending, the dynamic rule converges to the Stackelberg equilibrium having the pretending player as a leader. Finally, in the case where both players are pretending the dynamic rule converges to the Nash pretenders’ outcome.
4.2 Two-Player Sequential Move Games
We then study a dynamic game which represents the repeated version of a static Stackelberg game (or equivalently a repeated extensive form game). Particularly, Player 1 acts at time instants \(1,3,5,\dots \) and Player 2 at \(2,4,6,\dots \). The feedback Nash equilibrium of the dynamic game corresponds to the Stackelberg equilibrium of the static game.
The instantaneous costs are given by:
and the state equation by:
It is not difficult to prove that the Nash equilibrium (which coincides with the Stackelberg equilibrium of the corresponding static game) is given by:
To prove this fact, observe that (30) represents a best response for Player 2 at time step \(k=2\nu \), given the action of Player 1 at time step \(2\nu -1\), where \(x_k=u^1_{k-1}\). Then, substituting (30) into (26) and minimizing for \(x_k=u^1_{k-1}\), we get (29).
A simple dynamic rule that leads to the Nash equilibrium is given by:
where \(\hat{\theta }_1\) is the least squares estimate of \(\theta _2\) based on (30). To see that (31)–(33) converge to the Nash equilibrium in a finite number of steps, first observe that at time step \(k=3\), Player 1 will know the value of \(\theta _2\) and that for \(\hat{\theta }_1=\theta _2\) (32),(33) coincide with (29), (30).
A very simple manipulation rule for Player 2 is to use (33) with \(\theta ^{2,pr}_k =(\varepsilon -1)\) in place of \(\theta _2\), where \(\varepsilon \) is a small positive constant. Using these rules, after two time steps, the actions of the players will converge to:
As \(\varepsilon \) decreases and approaches 0, actions of both of players approach the value 2. In this case, the cost of Player 2 approaches 0.
Remark 4
The Stackelberg leader (Player 1) behaves approximately as the follower wants her to behave. The situation is quite similar to an inverse Stackelberg game with Player 2 as the leader [24]. Thus, Player 2 can achieve a very strong form of leadership and in this sense, we may conclude that: “At least in some cases, regarding leadership, the best use of the information available is much more important than the commitment or the timing of the game.”
Remark 5
Player 2 can achieve approximately the same cost, as if she was acting on Player 1’ behalf. An interesting question arises: “In which sense does this outcome express the ‘free will’ of Player 1”?
4.3 Many Player Symmetric Games
We then move to a slightly more general case, where there are N symmetrically interacting players. The instantaneous costs have the form:
The corresponding full information game has a unique feedback Nash equilibrium. It is not difficult to show that the equilibrium is given by:
We then consider the pretenders’ game. Using \(\theta ^{i,pr}\) in the place of \(\theta ^{i}\) in (37) and substituting back in (36) we get the costs in the corresponding pretenders’ game:
Considering the partial derivative of \(\bar{J}_i \) with respect to \(\theta ^{i,pr} \) and solving for \(\theta ^{i,pr} \) we obtain the following characterization of the equilibrium of the pretenders’ game:
Hence, the equilibrium of the pretenders’ game is given by:
Remark 6
Equation (40) shows that when the interactions are symmetric, the players tend to pretend less as the number of the players increases. Furthermore, the value of pretending decreases and the Nash pretenders’ outcome approaches the Nash equilibrium, as the number of players increases.
4.4 Many Player Asymmetric Games
We then consider a quadratic game with N players having asymmetric interactions. The instantaneous cost of each player is given by:
where \(a_{ij}\) represents the influence of the actions of Player j to Player i or, in the opinion game setting, how much Player i is interested in Player j’s opinion. We assume that \(\sum _{j=1}^Na_{ij}=1\) and that \(a_{ii}=0\), for any \(i=1,\dots ,N\). The matrix \(A=[a_{ij}]\) will be used. We further assume that \(\theta _i\in [-M,M]\).
The Nash equilibrium is characterized by:
where \(\theta =[\theta _1,\dots ,\theta _N]^T\) and \(u=[u^1,\dots ,u^N]^T\). There exists a unique Nash equilibrium:
where \(A^+=I+\frac{1}{2}A+\frac{1}{4}A^2+\dots \). The series in \(A^+\) converges due to the fact that A is a stochastic matrix.
For each Player i, let us introduce the following quantity:
We call \(C_i\) the “centrality of Player i.” The optimal pretending of each player will be expressed in terms of \(C_i\).
Remark 7
The quantity \(C_i\) has some similarities with the Bonacich centrality measures [9] and the PageRank index [30]. The centrality \(C_i\) may have the following representation: Assume that the matrix A stands for the transition probabilities of a Markov chain. Consider a random walk on a graph, each node of which represents a player. At each time step, the random walk with probability 1 / 2 stops and with probability 1 / 2 jumps probabilistically to a new node according to A. Then, \(C_i\) is the expected number of times that a random walk started at the node of Player i will visit that node again. \(\square \)
Let us focus on a certain Player i. Consider a profile of (real or pretended) types \(\tilde{\theta }=[\tilde{\theta }_1,\dots ,\tilde{\theta }_{i-1},\theta _i,\tilde{\theta }_{i+1},\dots ,\tilde{\theta }_N]^T\). The pretended type of Player i can be written as \(\theta ^{i,pr}=\theta _i+\delta ^i\). The cost of Player i in the pretenders’ game is given by:
Observing that \(e_i^TA^+e_i=1+C_i\), that \(e_i^TAA^+e_i=2C_i\) and that
the cost is written as:
Hence, the optimal pretending is given by:
Proposition 2
There exists an equilibrium in the pretenders’ game.
Proof
Consider the set \(B=[-2M,2M]^N\). The following claim will be used: Claim: If \(\tilde{\theta }\in B\), then it holds \(\theta _i+\delta ^{i,\star }\in [-2M,2M]\), for each player i. \(\square \)
To prove the claim, observe that \(C_i\in [0,1]\) and thus \(2C_i/(1+C_i^2)\in [0,1]\). Furthermore, \(\frac{1}{2}e_i^TA^+\tilde{\theta }\in [-2M,2M]\). Hence,
The fact that \(\theta _i+\delta ^{^i,\star }\) is a convex combination of two quantities in \([-2M,2M]\) completes the proof of the claim.
The claim shows that the best response maps B into B. Furthermore, B is compact and convex and the map is continuous. Thus, Brouwer fixed point theorem applies and the pretenders’ game has an equilibrium. \(\square \)
Equation (46) shows that the pretending of each player depends on its centrality \(C_i\). The plot of the function \(f(C_i)=2C_i/(1+C_i^2)\) is shown in Fig. 2. Furthermore, \(\alpha \) is bounded: \(|\alpha |\le 2M\). Thus, a player with low centrality tends to pretend less and the value of pretending is small.
In the following example, a game involving an influential player and many less influential players is analyzed.
Example 1
Consider a game with \(N+1\) players. The instantaneous cost for Player 0 is given by:
The instantaneous cost for each one of the players \(i=1,\dots ,N\) is given by:
The matrix A has the form:
Figure 3 illustrates the centrality \(C_0\) of the influential player 0 and the centrality \(C_i\) of any other player i, as the number of players increases. The centrality computations were made for \(a=0.5\).
As the number of players increases, only the influential player has a non-negligible motivation to pretend. Furthermore, the Nash pretenders’ outcome approaches the Stackelberg equilibrium with the influential player as the leader. \(\square \)
Remark 8
In Example 1, the players do not have differences in timing, commitment or the processing of the available information. However, the Nash pretenders’ outcome approaches the Stackelberg equilibrium as the number of players increases. This Stackelberg leadership appears intrinsically, without assuming different roles for the players a priori. Thus, pretending may serve as a natural Stackelberg leadership explanation.
Another way to understand, why when there is only one influential player she may acquire a leader role, is using the centrality measure \(C_i\). A high value for \(C_i\) means that there are some players who are affected by Player i’s pretended type, while at the same time Player i is interested in the actions of those same players. A small player is less able to affect through pretending the actions of the players which in turn affect her.
5 An LQ Non-Cooperative Pollution Control Game
In this section, we consider a very simple linear quadratic pollution control game with two players. The full information, continuous-time analogue of this model was analyzed in [11]. Each Player \(i=1,2\) stands for a country. The dynamics is given by:
The state variable \(x_k\) represents the pollution stock, and the actions of the players \(u_k^1\) and \(u_k^2\) stand for the emissions of the two countries. The pollution stock of the environment decreases naturally with a rate \(0<a<1\).
The cost functions of the players are given by:
Cost (50) comprises the effects of two factors. The economic part is expressed by the difference \((u_k^i)^2-u_k^i\), whereas the term \(qx_k^2\) corresponds to the environmental cost.
We assume that both players care the same about pollution, i.e., the parameter \(q>0\) is common among the players and that they have the same economic loss from reducing their emissions. However, they may have a different discount factor \(\rho _i\in [0.9,0.99]\). We assume that the players know the parameters q and a and each player knows its own discount factor. We assume, however, that each player is not aware of the discount factor of the other player.
Consider a pair of strategies \(u_k^1 = K_1^1x_k+K_2^1\), \(u_k^2 = K_1^2x_k+K_2^2\) constituting a Nash equilibrium of the corresponding full information game. Then, this pair is characterized by:
where \(\bar{a}^i = a+K_1^{-i}\) and \(v^i=K_2^{-i}+1/2\) and \(p^1, p^2>0\). The details of the optimal control problems are given in “Appendix sections A and B.”
Proposition 3
If \(0<a<1\), there exists a solution of (51)–(54) with \(p^1, p^2>0\). Thus, there exists a Nash equilibrium.
Proof
See “Appendix C.” \(\square \)
Assuming that each player is not aware of the discount factor of the other player, a simple adaptive rule, based on an estimation \(\hat{\rho }_{-i}\) of the opponent’s discount factor, may be used. The rule for Player i is described in Algorithm 1.
Remark 9
Step 2 of Algorithm 1 (also Step 2 of Algorithm 2) estimates the discount factor of the opponent based only on her last action. Of course several alternatives are available, but (55) is probably the simplest one. \(\square \)
A simple pretending strategy for Player i is given in Algorithm 2. In Algorithm 2, Player i uses the estimated discount factor of her opponent in order to pretend optimally.
Example 2
In this example we consider two players with discount factors 0.95. Furthermore, we assume that the parameters have values \(q=0.1 \) and \(a=0.8\).
At first, no player is pretending. After time step 20, Player 1 pretends applying Algorithm 2 and after time step 40 both players apply Algorithm 2.Footnote 1 For comparison, the trajectories of the players’ actions and state variable for the Nash equilibrium of the full information game are also presented. In Fig. 4 the actions of the players are presented. Fig. 5 shows the evolution of the pollution stock and Fig. 6 presents the evolution of the instantaneous cost for both players.
Remark 10
The numerical simulations show that the application of the dynamic rule described in Algorithm 1 has very similar results to the Nash equilibrium of the corresponding full information game. In the case of pretending, the manipulating player has the motivation to pretend that she has a lower discount factor (i.e., she cares less about the future) and relies to the other player to reduce its emissions. If one or both players pretend, the pollution stock increases. The non-pretending player needs to reduce its emissions, and there is also a larger pollution. Thus, pretending in this example is not a cooperative action.
6 Electricity Market Models
In this section two competitive market models are analyzed. The first considers a linear supply function duopoly model. A simple pretending algorithm is proposed, and the numerical simulation results show that pretending is a cooperative action, i.e., it increases the firms’ profits, while the price also increases. The second example considers a mechanism from the literature [34]. It is shown numerically that the attempt of the players to manipulate may make the mechanism not working at all.
6.1 Linear Supply Function Model
In this subsection, we assume that there are two competing production firms. Each one of them proposes a linear supply function [6] that is at each price level produces an energy quantity proportional to that price. We also consider linear demand of the form \(q=A-Bp\), where q is the total energy demand and p the price. The instantaneous cost of each player is given by:
where \(q_i\) is the amount of energy produced by Player i, the quantity \(c_iq_i^2\) represents the production cost and \(pq_i\) the revenues for Player i.
Each player chooses the constant \(u_i\) of the linear supply function, that is the quantity produced depends on the price according to the linear relation \(q_i=u_ip\). We assume that each player knows her own production cost but not the production cost of her opponent.
Simple manipulations show that the price and the instantaneous cost of the players depend on \(u_1,u_2\) according to:
and
The best response maps are given by:
It is easy to see that the best response map is contractive with respect to the infinity norm. Thus, if the players use repeatedly (60), (61) the system will approach the Nash equilibrium of the static full information game.
A simple pretending strategy for Player 1 is described in Algorithm 3.
A similar pretending algorithm may be used also by Player 2.
Example 3
Assume that \(A=1\), \(B=0.5\), \(c_1=1\), \(c_2=1.2\). Player 1 implements Algorithm 3, from the time step \(k=1\), and Player 2 uses the best response until time step \(k=200\) and Algorithm 3 afterward. For both players, we assume that Algorithm 3 is implemented with \(\varepsilon =0.01\). Figure 7 shows the evolution of pretended production costs, Fig. 8 presents the evolution of prices, and Fig. 9 shows the evolution of the player instantaneous profits (negative total costs).
Remark 11
In this example it happens that the Nash pretenders’ outcome induces lower costs (higher value) for both players, compared to the Nash equilibrium of the corresponding static full information game. Furthermore, each player has a motivation to pretend that she has a higher production cost, in order to drive the prices higher. This in turn benefits the other player, for two different reasons. First, she is going to acquire a larger market share and second she is going to sell this larger amount of energy at a higher price. Thus, pretending is a cooperative action, i.e., each player prefers the other player to pretend than not to pretend. \(\square \)
6.2 The Rasouli–Teneketzis Mechanism
In this subsection we analyze an example of an electricity market having many competing production firms. At first, we briefly review a market mechanism described in [34]. An algorithm to approximate the Nash equilibrium is then proposed and tested numerically. The ability of the participants to manipulate the rules of their opponents is then studied.
There are N energy producers \(P_1,\dots ,P_N\). Each one of them produces a quantity of energy \(q^i\) and proposes a price \(p^i\) that is the action of a player is given by \(u^i = [q^i~p^i]^T\). Let us denote the total demand by \(D_0>0\). The production cost for Player \(P_i\) is given by \(C_i(q^i)=c_i(q^i)^2\). The instantaneous cost of each player is given by:
where \(t^i\) is the payment to the producer i made by the system operator. The payment \(t^i\) is given by:
where \(p^{N+1}\equiv p^1\) and
is the power surplus (or deficit if it is negative).
[34] studies the non-trivial Nash equilibria of the game, i.e., the Nash equilibria such that it does not hold \(q^1=\dots =q^N=0\). It is shown that in any such equilibrium the total production coincides with the demand, the proposed prices are identical, and the total energy is produced with the smallest total cost (the sum of \(C_i\) is minimum). Furthermore, a non-trivial Nash equilibrium always exists. In this sense, the non-trivial Nash equilibria of the game produce “socially optimal” outcomes.
We then propose a very simple algorithm (Algorithm 4). This algorithm is found numerically to converge asymptotically to a non-trivial Nash equilibrium of the game.
Lemma 1
If Algorithm 4 converges, then the limit point is a non-trivial Nash equilibrium of the game.
Proof
First, assume that the dynamics described by Algorithm 4 converges to a fixed point, \((p_i^{\infty }, q_i^\infty )_{ i=1}^N\). For this fixed point, it holds \(\zeta \ge 0\), due to the fact that if \(\zeta <0 \), we would have \(p_{k+1}^i = p_k^i+\delta _1\), for each i and \((p_i^{\infty }, q_i^\infty )_{ i=1}^N\) would not be a fixed point. Thus, using (65), we conclude that \(q_i^\infty \) is a best response to the actions of the others. Furthermore, the prices \(p_i^\infty \) are fixed if they correspond to a best response of the actions of the other players. Thus any fixed point of (65) and (67) corresponds to a Nash equilibrium of the game. Observe that the trivial Nash equilibrium is not a fixed point. Thus, any fixed point of the dynamics is a non-trivial Nash equilibrium. It remains to show that the limit of the dynamics is a fixed point. For the proof see “Appendix E.” \(\square \)
Example 4
In this example there are four production firms with production costs \(c_1=0.5, c_2=2, c_3=3, c_4= 10\). Figures 10 and 11 show the evolution of the energy production and the proposed prices assuming that the players follow Algorithm 4. \(\square \)
A very simple manipulating strategy is then described. A manipulating Player i implements Algorithm 4 as if she had a different (not real) production cost \(c^{i,pr}\). The pretended type is updated rarely but periodically according to the rule:
where \(\rho \) is a positive constant, T is the time interval, and \(sat_{[-M,M]}(\cdot )\) is the saturation function:
Remark 12
Pretending rule (68) does not estimate the production cost of the other players. Instead, it uses a gradient decent like rule to minimize the cost with respect to the pretended cost. A learning scheme very much related to (68) is directional learning [29]. Particularly, in [29] a player increases the probability of contributing to a game of public goods, if a previous round switch from non-contributing to contributing increased her payoff or a switch from contributing to non-contributing reduced her payoff. Similarly, a player reduces her probability of contributing if a previous round switch from contributing to non-contributing increased her payoff or a switch from non-contributing to contributing reduced her payoff. \(\square \)
Example 5
In this example there are eight production firms. We assume that only Player 1 (having the smaller production cost) is pretending implementing (68). Figure 12 illustrates the evolution of the prices, and Fig. 13 the evolution of production quantities. It turns out that the prices proposed by the several players converge to identical values. Furthermore, the pretending of Player 1 reduces her market share but increases the prices. Thus, pretending is slightly beneficial for Player 1, whereas it is more beneficial for the other players which sell energy at higher prices and also get larger market shares. Thus, in this case pretending is a cooperative behavior. \(\square \)
Example 6
This example continues Example 5 assuming that all the players are pretending. Figure 14 shows the evolution of the prices, and Fig. 15 the evolution of quantities. In this example, price grows unbounded. The reason is that the optimal pretended production cost for each player increases when the pretended costs of the others increase. This dynamics leads to price divergence. Therefore, the mechanism in this case is not working at all. \(\square \)
Remark 13
The mechanism was designed in order to have a “socially optimal” behavior in the Nash equilibrium. However, each one of the players has a motivation to pretend. If all the players pretend, the system is not working at all, as illustrated in the last example where the prices diverge. \(\square \)
7 Conclusion
We studied the possibility of manipulation in game situations where the players are using a dynamic rule to determine their future actions. Some alternative outcomes were proposed, and the results were applied in opinion quadratic games, a linear quadratic environmental game and two electricity market models.
Several relationships among pretending and leadership were identified in the repeated quadratic games examples. Particularly, in the two-player simultaneous move repeated game the pretender has the same cost as if she were Stackelberg leader, in the repeated Stackelberg game if the follower pretends she may achieve the same cost as if she were the leader in the corresponding inverse Stackelberg game and in the case where there is an influential player and many players of low influence the Nash pretenders’ outcome approaches the Stackelberg equilibrium as the number of players increases.
In the environmental game example pretending leads to a less cooperative behavior and more pollution. In the electricity market games, pretending may enhance the cooperation among the players, increase the prices or even make the system not working at all.
The alternative outcomes proposed are by no means the only reasonable ones, and further work is needed. For example, how would a player react knowing that her opponent tries to manipulate her? Another direction of future research is the generalization of the results about centrality to dynamic game situations. Furthermore, it would be interesting to study how to design the network of interactions such that the motivation to manipulate is not large.
Notes
The reason we do not assume that none, one or both players pretend from the beginning is to use a single pair of figures to illustrate the results.
References
Acemoglu D, Yildiz M (2001) Evolution of perceptions and play. Mimeo, MIT. Available online https://economics.mit.edu/files/200
Antoniadis P, Fdida S, Griffin C, Jin Y, Kesidis G (2013) Distributed medium access control with dynamic altruism. In: Zheng J, Mitton N, Li J, Lorenz P (eds) Ad hoc networks. Springer, Berlin, pp 29–42
Atakan AE, Ekmekci M (2013) A two-sided reputation result with long-run players. J Econ Theory 148(1):376–392
Azad AP, Altman E, El-Azouzi R (2010) Routing games: from egoism to altruism. In: Modeling and optimization in mobile, ad hoc and wireless networks (WiOpt), proceedings of the 8th international symposium on, IEEE, 2010, pp 528–537
Azad AP, Musacchio J (2011) Unilateral altruism in network routing games with atomic players. In: Network games, control and optimization (NetGCooP), 5th international conference on, IEEE, 2011, pp 1–8
Baldick R, Grant R, Kahn E (2004) Theory and application of linear supply function equilibrium in electricity markets. J Regul Econ 25(2):143–167
Basar T (1973) On the relative leadership property of stackelberg strategies. J Optim Theory Appl 11(6):655–661
Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, 2nd edn. SIAM, Philadelphia, PA
Bonacich P (1987) Power and centrality: a family of measures. Am J Sociol 92:1170–1182
Chan YM, Cruz J (1983) Decentralized stochastic adaptive nash games. Optim Control Appl Methods 4(2):163–178
Dockner EJ, Van Long N (1993) International pollution control: cooperative versus noncooperative strategies. J Environ Econ Manag 25(1):13–29
Dufwenberg M, Güth W (1999) Indirect evolution vs. strategic delegation: a comparison of two approaches to explaining economic institutions. Eur J Political Econ 15(2):281–295
Feldbaum AA (1960) Dual control theory. Autom Remote Control 21(9):874–1039
Foster DP, Young HP (2006) Regret testing: learning to play nash equilibrium without knowing you have an opponent. Theor Econ 1:341–367
Fudenberg D, Levine D (1989) Reputation and equilibrium selection in games with a patient player. Econometrica 57(4):759–778
Fudenberg D, Levine DK (1998) The theory of learning in games, vol 2. MIT press, Cambridge
Gabriel SA, Conejo AJ, Fuller JD, Hobbs BF, Ruiz C (2012) Complementarity modeling in energy markets, vol 180. Springer, New York
Güth W (1995) An evolutionary approach to explaining cooperative behavior by reciprocal incentives. Int J Game Theory 24(4):323–344
Güth W, Peleg B (2001) When will payoff maximization survive? An indirect evolutionary analysis. J Evol Econ 11(5):479–499
Hamilton JH, Slutsky SM (1990) Endogenous timing in duopoly games: Stackelberg or cournot equilibria. Games Econ Behav 2(1):29–46
Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):11271150
Heifetz A, Shannon C, Spiegel Y (2007) The dynamic evolution of preferences. Econ Theory 32(2):251–286
Heifetz A, Shannon C, Spiegel Y (2007) What to maximize if you must. J Econ Theory 133(1):31–57
Ho YC, Luh PB, Olsder GJ (1982) A control-theoretic view on incentives. Automatica 18(2):167–179
Konigstein M, Muller W (2000) Combining rational choice and evolutionary dynamics: the indirect evolutionary approach. Metroeconomica 51(3):235–256
Kreps DM, Wilson R (1982) Reputation and imperfect information. J Econ Theory 27(2):253–279
Lasaulce S, Tembine H (2011) Game theory and learning for wireless networks: fundamentals and applications. Academic Press, Waltham
Mailath GJ, Samuelson L (2006) Repeated games and reputations, vol 2. Oxford University Press, Oxford
Nax HH, Perc M (2015) Directional learning and the provisioning of public goods. Sci Rep 5:8010
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab
Papavassilopoulos GP (1988) Adaptive games. In: Albeverio S, Blanchard P, Hazewinkel M, Streit W (eds) Stochastic processes in physics and engineering. Springer, Berlin, pp 223–236
Papavassilopoulos GP (1986) Iterative techniques for the nash solution in quadratic games with unknown parameters. SIAM J Control Optim 24(4):821–834
Perc M, Szolnoki A (2010) Coevolutionary games–a mini review. BioSystems 99(2):109–125
Rasouli M, Teneketzis D (2014) Electricity pooling markets with strategic producers possessing asymmetric information ii: inelastic demand. arXiv arXiv: 1404.5539
Schipper BC (2011) Strategic control of myopic best reply in repeated games. Available at SSRN 1804667
Schipper BC (2015) Strategic teaching and learning in games. CoRR, abs/1504.06341, [Online]. Available http://arxiv.org/abs/1504.06341
Schipper BC (2009) Imitators and optimizers in cournot oligopoly. J Econ Dyn Control 33(12):1981–1990
Selten R (2001) What is bounded rationality. In: Bounded rationality: the adaptive toolbox. MIT Press, pp 13–36
Sengul M, Gimeno J, Dial J (2012) Strategic delegation a review, theoretical integration, and research agenda. J Manag 38(1):375–414
Shoham Y, Leyton-Brown K (2008) Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press, Cambridge
Simon HA (1972) Theories of bounded rationality. Decis Organ 1:161–176
Skoulidas C, Vournas C, Papavassilopoulos GP (2002) Adaptive game modeling of deregulated power markets. In: IEEE power engineering review, pp. 42–45
Szolnoki A, Perc M (2008) Coevolution of teaching activity promotes cooperation. New J Phys 10(4):043036
Witsenhausen HS (1968) A counterexample in stochastic optimum control. SIAM J Control 6(1):131–147
Wittenmark B (2002) Adaptive dual control. In: Control systems, robotics and automation, encyclopedia of life support systems (EOLSS), Developed under the auspices of the UNESCO. Eolss Publishers, Oxford
Yang WY, Papavassilopoulos GP (1994) Decentralised adaptive control in a game situation for discrete-time, linear, time-invariant systems. In: American control conference, vol. 3. IEEE, pp 3394–3399
Young HP (2004) Strategic learning and its limits. Oxford University Press, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
This research has been cofinanced by the European Union (European Social Fund—ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: THALES. Investing in knowledge society through the European Social Fund and the program ARISTEIA, project name HEPHAISTOS.
Appendices
Appendix A: Scalar LQ Optimal Control With an Additive term
Consider the scalar system:
where v is a constant. Consider also the quadratic criterion:
Simple manipulations show that the optimal cost is given by:
and the constants p and \(c_1\) satisfy:
Furthermore, the optimal controller is given by:
Appendix B: The Optimal Control Problems of Section 5
The optimal control problems of Sect. 5 are then reduced to a pair of problems in the form which was studied in Subsection A.
Without loss of generality let us state the optimal control problem of Player 1. Assume that Player 2 uses a control policy in the form \(u^2_k = K_1^2x_k+K_2^2\). Then the cost function given by (50) is written as:
Ignoring the constant term, and setting \(\bar{a}=a+K_1^2\), \(\bar{u}_k=u_k^1-1/2\) and \(v=K_2^2+1/2\), the problem reduces to the minimization of (70) subject to (69). The optimal controller is given by:
Appendix C: Proof of Proposition 3
Let us first observe that (51), (53) do not depend on (52), (54). Thus, let us first show the existence of a solution of(51), (53). Consider the compact and convex domain \(D\subset \mathbb {R}^4\) with:
Consider also the mapping \(f:D\rightarrow \mathbb {R}^4\), with:
where \(p^{i,+}\) is the unique positive solution of:
and
It is not difficult to see that f maps D into D. For, observe that \(p^{i,+}>0\) and \(p^{i,+}<q+1\) and that \(\bar{a}^{i,+}\in (0,a)\). Applying Brouwer’s fixed point theorem, we conclude that there exists a solution of (51), (53).
Then, assume \(p^i\), \(K^i_1\), \(i=1,2\) is a solution of (51), (53). Equation (52) is equivalent to:
Using (54) and \(v^{-i}=K_2^i+1/2\), we obtain:
It is not difficult to see that \(\rho _i\frac{2p^i+\lambda _i}{2(1+\rho _i p^i)}<1\). For, observe that this inequality is equivalent to \(\rho _i\lambda _i<2 \), which can be written as:
which is trivially true, due to the fact that \(\bar{a}^i<1\) and \(\rho _i<1\).
Hence, the pair of equations obtained by (81) substituting \(i=1\) and \(i=2\) has a unique solution. Substituting to (80) and (54) a solution of equations (51)–(54) is obtained.
Appendix D: Cost Computations
Some computations, needed to implement Algorithm 2, are then presented. Particularly, the value of the cost function:
is computed under the dynamics:
and the control law \(u_k=K_1x_k+K_2\).
The equilibrium point of the dynamics is given by \(\bar{x}= \bar{v}/(1-\tilde{a})\), where \(\tilde{a}=\bar{a}+K_1\) and \(\bar{v}=v+K_2\). The computation is simplified using the quantity \(z_k=x_k-\bar{x}\) which evolves according to \(z_{k+1}=\bar{a}z_k\).
Simple manipulations show that J can be written as:
where \(s_1=q+K_1^2\), \(s_2=q\bar{x}+K_1^2\bar{x}+K_1K_2-K_1/2\) and \(s_3 = q\bar{x}^2+(K_1\bar{x}+K_2)^2-K_1\bar{x}-K_2\). Thus, assuming that \(\tilde{a}<1\), it holds:
To compute \(J_i(\gamma ^i,\gamma ^{-i})\) with \(\gamma ^j(x_k)= K_1^jx_k+K_2^j\) and \(j=1,2\), set \(v=K_2^{-i}\) and \(\bar{a} = a+K_1^{-i}\). The rest of the computations are performed with \(K_1=K_1^i\), \(K_2=K_2^i\).
Appendix E: Proof of Lemma
It remains to show that the limit of the sequence is a fixed point. Assume that the sequence \((p_i^{k}, q_i^k)_{ i=1}^N\) obtained by iterations of Algorithm 4 converges to a limit \((p_i^{\infty }, q_i^\infty )_{ i=1}^N\). Then, there exists an \(k_0\) such that for any \(k\ge k_0\) it holds \(\zeta _k\ge 0\). Indeed, otherwise Step 3 of the algorithm would be used infinitely often and we would have \(\Vert (p_i^{k+1}, q_i^{k+1})_{ i=1}^N-(p_i^{k}, q_i^k)_{ i=1}^N\Vert \ge \delta _1\) for an infinite number of k. Thus, the sequence would not converge. Therefore, after \(k_0\) the terms of the sequence satisfy (65) and (67).
Observe that the function:
where \(p_i^{k+1}, q_i^{k+1}\) are given by (65) and (67) is continuous. Thus,
which completes the proof.
Rights and permissions
About this article
Cite this article
Kordonis, I., Charalampidis, A.C. & Papavassilopoulos, G.P. Pretending in Dynamic Games, Alternative Outcomes and Application to Electricity Markets. Dyn Games Appl 8, 844–873 (2018). https://doi.org/10.1007/s13235-017-0229-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-017-0229-3