Abstract
A “zero-sum” Linear-Quadratic Gaussian Dynamic Game (LQGDG) where the players have partial information is considered. Specifically, the players’ initial state information and their measurements are private information, but each player is able to observe his antagonist’s past inputs: the protagonists’ past controls is shared information. Although this is a game with partial information, the control-sharing information pattern renders the game amenable to solution by the method of dynamic programming. The correct solution of LQGDGs with a control-sharing information pattern is obtained in closed-form.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
MSC Codes:
MSC Codes: 91A25, 93C41, 49N70
8.1 Introduction
The complete solution of Linear-Quadratic Gaussian Dynamic Games (LQGDGs) has been a longstanding goal of the controls and games communities. That LQGDGs with a nonclassical information pattern can be problematic has been amply illustrated in Witsenhausen’s seminal paper (Witsenhausen 1968)—see also Pachter and Pham (2014). Control theorists have traditionally emphasized control theoretic aspects and the backward induction/dynamic programming solution method, which however is not applicable to dynamic games with partial information—one notable exception notwithstanding, being the game with partial information that will be discussed herein. And game theorists have focused on information economics, that is, the role of information in games, but for the most part, discrete games. The state of affairs concerning dynamic games with partial information is not satisfactory. In this respect, the situation is not much different now than it was in 1971 when Witsenhausen made a similar observation (Witsenhausen 1971). In this article a careful analysis of dynamic games with partial information is undertaken. We exclusively focus on LQGDGs, which are more readily amenable to analysis. Indeed, Linear-Quadratic Dynamic Games (LQDGs) with perfect information stand out as far as applications of the theory of dynamic games are concerned: a canonical instance of an application of the theory of LQDGs can be found in Ho et al. (1965) where it has been shown that its solution yields the Proportional Navigation (PN) guidance law which is universally used in Air-to-Air missiles. Furthermore, the theory of LQDGs has been successfully applied to the synthesis of \(H_{\infty }\) control laws (Basar and Bernhard 2008). The theory of LQDGs with perfect information has therefore received a great deal of attention (Basar and Olsder 1995; Engwerda 2005; Pachter and Pham 2010). In these works, the concepts of state, and state feedback, are emphasized and the solution method entails backward induction, a.k.a., Dynamic Programming (DP).
Concerning informational issues in LQGDGs: In previous work Radner (1962) and Pachter and Pham (2013) a static Linear-Quadratic Gaussian (LQG) team problem was addressed and a static “zero-sum” LQG game with partial information was analyzed in Pachter (2013). In this article a dynamic “zero-sum” LQG game, that is, a LQGDG, where the players have partial information, is addressed. The information pattern is as follows. The players’ initial state information and their measurements are private information, but each player is able to observe the antagonist’s past inputs: the protagonists’ past controls is shared information. This information pattern has previously been discussed by Aoki (1973), and in the context of a team decision problem, this information pattern has also been discussed in Sandell and Athans (1974). However, Aoki (1973) took “a wrong turn”: as so often happens in the literature of games with partial information, one is tempted to assume the players will try to second guess the opponents’ private information, say, their measurements. The vicious cycle of second guessing the opponent’s measurements leads to a mirror gallery like setting and to a dead end. This point is discussed in Sect. 8.4. Concerning reference Sandell and Athans (1974) where a decentralized dynamic team problem with a control-sharing information pattern is considered: It is argued that an infinite amount of information is contained in a real number, which, in theory, is correct. And since the control information is shared, then at least in a cooperative control/team setting, a player/agent could in principle encode in the controls information about to be sent to his partner his private information, for example, his measurements history. This is due to the fact that the controls which are about to be communicated can be modified slightly to encode the measurements information of the protagonists without significantly disturbing the control, and consequently, have a barely noticeable effect on the value of the game. One then falls back on the solution of the LQG cooperative control problem with a one-step-delay shared information pattern (Kurtaran and Sivan 1974). However, this scheme has no place in an antagonistic scenario, a.k.a., “zero-sum” LQGDG as discussed in our paper, and also does not properly model a decentralized control scenario. Moreover, this scheme totally depends on the players’ ability of obtaining a noiseless observation of the broadcast control and as such, exhibits a lack of robustness to measurement error and is not a viable proposition. In the end, it is acknowledged in Sandell and Athans (1974) that the control-sharing information pattern leads to a stochastic control problem that is ill posed and it is stated that “the need for future work on this problem is obvious”. Unfortunately, the analysis of LQGDGs with a control-sharing information pattern presented in Aoki (1973) and Sandell and Athans (1974) is patently incorrect. In this paper LQGDGs with a control-sharing information pattern are revisited. A careful analysis reveals that although this is a game with partial information, the control-sharing information pattern renders the game amenable to solution by the method of DP. It is shown that the solution of the LQGDG with a control-sharing information pattern is similar in structure to the solution of the LQG optimal control problem in so far as the principle of certainty equivalence/decomposition holds. A correct closed-form solution of a LQGDG with a control-sharing information pattern is obtained.
The paper is organized as follows. The LQGDG problem statement and the attendant control-sharing information pattern are presented in Sect. 8.2. The state estimation algorithm required for the solution of LQGDGs with a control-sharing information pattern is developed in Sect. 8.3. The analysis of Linear-Quadratic Gaussian Games with a control-sharing information pattern is anchored in Sect. 8.4 where the end-game is solved and the solution of the LQGDG with a control-sharing information pattern is obtained in Sect. 8.5 using the method of backward induction/DP. The results are summarized in Sect. 8.6, followed by concluding remarks in Sect. 8.7. For the sake of completeness, the solution of the baseline deterministic LQDG game with perfect information (Pachter and Pham 2010) is included in the Appendix. The somewhat lengthy exposition could perhaps be excused in light of Witsenhausen’s observation when discussing LQG control (Witsenhausen 1971): “The most confused derivations of the correct results are also among the shortest”.
8.2 Linear Quadratic Gaussian Dynamic Game
Two player Linear Quadratic Gaussian Dynamic Games (LQGDGs) are considered. The players are designated P and E, and the game is specified as follows.
-
Dynamics: Linear
$$\displaystyle\begin{array}{rcl} x_{k+1} = A_{k}x_{k} + B_{k}u_{k} + C_{k}v_{k} +\varGamma _{k}w_{k},\ x_{0} \equiv x_{0},\ \ k = 0,\ldots,N - 1& & {}\end{array}$$(8.1)
At decision time k the controls of players P and E are u k and v k , respectively. The process noise \(w_{k} \sim \mathcal{N}(0,Q_{p}),\ \ k = 0,\ldots,N - 1\). The planning horizon is N.
-
Measurements: Linear
The N measurements of player P are:
At time k = 0, \(\overline{x}_{0}^{(P)}\)— player P believes that the initial state
and thereafter he takes the measurements
The N measurements of player E are:
At time k = 0, \(\overline{x}_{0}^{(E)}\)— player E believes that the initial state
and thereafter he takes the measurements
-
Cost/Payoff Function: Quadratic
We confine our attention to the antagonistic “zero-sum” game scenario where the respective P and E players strive to minimize and maximize the cost/payoff function
Specifically, players P and E minimize and maximize their expected cost/payoff E(J∣⋅ ), conditional on their private information. The expectation operator is liberally used in the dynamics game literature but oftentimes it is not clearly stated with respect to which random variables the expectation is calculated and on which random variables the expectation is conditional. This tends to mask the fact that what appear to be “zero-sum” games are in fact nonzero-sum games. Upon considering “zero-sum” games with partial information, the illusion is then created that a zero-sum game is considered. One then tends to rely on the uniqueness of the saddle point value and the interchangeability of non-unique optimal saddle point strategies in zero-sum games. This argument is flawed because, as previously discussed, in “zero-sum” games with partial information the P and E players calculate their respective cost and payoff conditional on their private information, as is correctly done in this paper; that’s why I put the term zero-sum in quotation marks. Thus, although high powered mathematics is oftentimes used, serious conceptual errors make the “results” not applicable. Contrary to statements sometimes encountered in the literature, in “zero-sum” games with partial information one cannot look for a saddle point solution and the correct solution concept is a Nash equilibrium, that is, Person by Person Satisfactory (PBPS) solution. In this paper a unique Nash equilibrium is provided and the P and E players’ value functions are calculated.
-
Information pattern
-
1.
Public information
-
(a)
Problem parameters: A k , B k , C k , H k (P), H k (E), Q p , Q k , Q F , R k (P), R k (E), R m (P), R m (E).
-
(b)
Prior information: P 0 (P), P 0 (E).
-
(a)
-
2.
Private information
-
At decision time k = 0 the prior information of player P is \(\overline{x } _{0 }^{(P)}\).
-
At decision time k = 0 the prior information of player E is \(\overline{x } _{0 }^{(E)}\).
-
At decision time 1 ≤ k ≤ N − 1 the information of player P are his measurements \(\overline{x } _{0 }^{(P)}\), z 1 (P), …, z k (P) and ownship control history u 0, …, u k−1.
-
At decision time 1 ≤ k ≤ N − 1 the information of player E are his measurements \(\overline{x } _{0 }^{(E)}\), z 1 (E), …, z k (E) and ownship control history v 0, …, v k−1.
-
-
1.
-
Sufficient statistics
-
The sufficient statistics of player P at decision time k = 0: \(x_{0} \sim \mathcal{N}(\overline{x}_{0}^{(P)},P_{0}^{(P)})\).
-
The sufficient statistics of player E at decision time k = 0: \(x_{0} \sim \mathcal{N}\) \((\overline{x}_{0}^{(E)},P_{0}^{(E)})\).
-
The sufficient statistics of player P at decision time 1 ≤ k ≤ N − 1: The p.d.f. f k (P)(⋅ ) of the physical state x k , as calculated by player P using his private information.
-
The sufficient statistics of player E at decision time 1 ≤ k ≤ N − 1: The p.d.f. f k (E)(⋅ ) of the physical state x k , as calculated by player E using his private information.
-
Remark.
In the static LQGDG (Pachter 2013) where N = 1 the respective sufficient statistics of P and E are \(\overline{x}_{0}^{(P)}\) and \(\overline{x}_{0}^{(E)}\).
8.2.1 Problem Statement
The LQGDG (8.1)–(8.6) is considered and it is assumed that the players’ information sets are augmented as follows: At decision time k, \(k = 1,\ldots,N - 1\), player P is endowed with the additional information regarding the control history v 0, …, v k−1 of player E. Thus, player P observed the past inputs v 0, …, v k−1 of player E. Similarly, at decision time k, \(k = 1,\ldots,N - 1\), player E is endowed with the additional information regarding the control history u 0, …, u k−1 of player P. Thus, player E observed the past inputs u 0, …, u k−1 of player P.
The information pattern considered herein is referred to as the control-sharing information pattern. The dynamics and the measurement equations are linear, and the cost/payoff function is quadratic, but the information pattern is not classical. Strictly speaking, the information pattern is not partially nested because E’s measurements, which he used to form his controls, are not known to P, and vice versa, P’s measurements, which he used to form his controls, are not known to E. However, this is now a moot point because the information pattern is s.t. the control history of player E is known to player P, and vice versa, the control history of player P is known to player E. This, and the fact that player P and player E, each separately, perceive the initial state x 0 to be Gaussian, causes the state estimation problem faced by the players at decision time k to be Linear and Gaussian (LG). Hence, at decision time k, the knowledge of the complete control history \(u_{0},v_{0},\ldots,u_{k-1},v_{k-1}\) and their private measurement records makes it possible for both players to separately apply the linear Kalman filtering algorithm: based on his private measurements record, each player runs a Kalman filter using his measurements and the complete input history, and separately obtains an estimate of the state x k —strictly speaking, the p.d.f. of x k is separately obtained by each player. Thus, players P and E perceive the current state x k to be Gaussian distributed. Having run their respective Kalman filters, at time k player P believes that the state
and player E believes that the state
but they are also aware that their state estimates are correlated—see Sect. 8.3.
Since the LQGDG (8.1)–(8.6) is LG, the P and E players’ separately calculated sufficient statistics are given by Eqs. (8.7) and (8.8), and their controls will be determined by their optimal strategies according to \(u_{k}^{{\ast}} = (\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)},P_{k}^{(P)}))^{{\ast}}\) and \(v_{k}^{{\ast}} = (\gamma _{k}^{(E)}(\overline{x}_{k}^{(E)},P_{k}^{(E)}))^{{\ast}}\). In fact, we shall show that in LQGDGs with a control-sharing information pattern the optimal strategies are of the form
and are linear.
8.3 Kalman Filtering
LQGDGs with a control-sharing information pattern are Linear-Gaussian (LG) and consequently at decision time k each player can separately calculate his estimate of the physical state x k using a linear Kalman Filter (KF). Player P runs the KF
and so at decision time k player P obtains his estimate \(\overline{x}_{k}^{(P)}\) of the state x k . Similarly, player E runs the KF
and so at decision time k player E obtains his estimate \(\overline{x}_{k}^{(E)}\) of the state x k . The P and E players can calculate their respective state estimation error covariances and Kalman gains P k (P), K k+1 (P), P k (E) and K k+1 (E) ahead of time and off line.
In LQGDGs with a control-sharing information pattern the players’ sufficient statistic is their state estimate; the latter is the argument of their strategy functions (8.9) and (8.10). Hence, in the process of countering E’s action, P must compute the statistics of E’s state estimate \(\overline{x}_{k}^{(E)}\), and, vice versa, while planning his move, E must compute the statistics of P’s state estimate \(\overline{x}_{k}^{(P)}\). Momentarily assume the point of view of player P: As far as P is concerned, the unknown to him state estimate of player E at time k, \(\overline{x}_{k}^{(E)}\), is a random variable (and consequently E’s input at time k is a random variable). Similarly, player E will consider the unknown to him state estimate of player P at time k, \(\overline{x}_{k}^{(P)}\), to be a random variable (and consequently P’s input at time k is a random variable). Hence, in the LQGDG with a control-sharing information pattern, at time k player P will estimate E’s state estimate \(\overline{x}_{k}^{(E)}\) using his calculated ownship state estimate \(\overline{x}_{k}^{(P)}\), and vice versa, player E will estimate P’s state estimate \(\overline{x}_{k}^{(P)}\) using his calculated ownship state estimate \(\overline{x}_{k}^{(E)}\). Thus, in the LQGDG with a control-sharing information pattern and with his state estimate \(\overline{x}_{k}^{(P)}\) at time k in hand, player P calculates the statistics of E’s state estimate \(\overline{x}_{k}^{(E)}\), conditional on the public and private information available to him at time k. Similarly, having obtained at time k his state estimate \(\overline{x}_{k}^{(E)}\), player E calculates the statistics of the state estimate \(\overline{x}_{k}^{(P)}\) of player P, conditional on the public and private information available to him at time k. Let’s start at decision time k = 0.
Player P models his measurement/estimate \(\overline{x}_{0}^{(P)}\) of the initial state x 0 as
where x 0 is the true physical state and e 0 (P) is player P’s measurement/estimation error, whose statistics, in view of Eq. (8.2), are \(e_{0}^{(P)} \sim \mathcal{N}(0,P_{0}^{(P)})\). In addition, player P models player E’s measurement \(\overline{x}_{0}^{(E)}\) of the initial state x 0 as
where, as before, x 0 is the true physical state and e 0 (E) is player E’s measurement/estimation error, whose statistics, which are known to P—see Eq. (8.4)—are \(e_{0}^{(E)} \sim \mathcal{N}(0,P_{0}^{(E)})\). The Gaussian random variables e 0 (P) and e 0 (E) are independent—by hypothesis. From player P’s point of view, \(\overline{x}_{0}^{(P)}\) is known but \(\overline{x}_{0}^{(E)}\) is a random variable. Subtracting Eq. (8.21) from Eq. (8.22), at time k = 0 player P concludes that as far as he is concerned, player E’s measurement upon which he will decide, according to Eq. (8.10), on his optimal control v 0 ∗, is the random variable
In other words, as far as P is concerned, E’s estimate \(\overline{x}_{0}^{(E)}\) of the initial state x 0 is the Gaussian random variable
Thus, player P has used his measurement/private information \(\overline{x}_{0}^{(P)}\) and the public information P 0 (P) and P 0 (E) to calculate the statistics of the sufficient statistic \(\overline{x}_{0}^{(E)}\) of player E, which is the argument of E’s strategy function γ 0 (E)(⋅ ); the latter, along with P’s control u 0, will feature in player’s P cost functional. Similarly, as far as player E is concerned, at time k = 0 the statistics of the sufficient statistic \(\overline{x}_{0}^{(P)}\) of player P are
Similar to the case where k = 0, as far as player P is concerned the state estimate of player E at decision time k ≥ 1 is the random variable
that is, at decision time k player P believes that the state estimate \(\overline{x}_{k}^{(E)}\) of player E is
where the covariance matrix
At the decision time instants \(k = 1,\ldots,N - 1\) the P and E players’ respective state estimation errors e k (P) and e k (E) are now correlated—this is caused by the process dynamics being driven in part by process noise.
Similarly, as far as he is concerned, player E believes that at decision time k the state estimate \(\overline{x}_{k}^{(P)}\) of player P is the random variable
Concerning decision time k ≥ 1: Let the covariance matrix
It can be shown that the recursion for the correlation matrix \(\tilde{P}_{k}^{(E,P)}\) is
In summary, at decision time \(k = 0,\ldots,N - 1\) player P believes that the statistics of E’s estimate \(\overline{x}_{k}^{(E)}\) of the state x k are given by Eq. (8.26) and player E believes that the statistics of P’s estimate \(\overline{x}_{k}^{(P)}\) of the state x k are given by Eq. (8.27) where
The KF covariance matrices P k (P), P k (E) and \(\tilde{P}_{k}^{(E,P)}\) are calculated ahead of time by solving the respective recursions (8.12), (8.13), (8.15); (8.17), (8.18), (8.20); and (8.29).
Finally, since in LQGDGs with a control-sharing information pattern the sufficient statistic is the players’ state estimate, then upon employing the method of Dynamic Programming, at decision time k player P must project ahead the estimate of the physical state x k+1 that the Kalman filtering algorithm will provide at time k + 1. It can be shown that at time k player P believes that the future state x k+1 at time k + 1 will be the Gaussian random variable
Similarly, at decision time k player E’s estimate of the state x k+1 at time k + 1 will be the Gaussian random variable
8.4 End Game
In the best tradition of backward induction/Dynamic Programming, the terminal stage of the game, namely, the players’ decision time \(k = N - 1\) is analyzed first. In the end game the cost/payoff function is
It is convenient to momentarily set \(Q_{F}:= Q + Q_{F}\) whereupon the terminal cost/payoff
The players’ sufficient statistics in this LG game are the expectation of the physical state and the covariance of the state’s estimation error: having run their respective Kalman filters during the time interval [1, N − 1], at decision time N − 1 the information available to player P is \((\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\) and the information of player E is \((\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\). In other words, at decision time N − 1 player P believes the physical state x N−1 to be \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\) whereas player E believes the physical state x N−1 to be specified as \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\). This is tantamount to stipulating that players P and E took separate measurements of the state x N−1. The quality of the players’ “instruments” used to take the measurements and also the degree of correlation of the players’ measurement errors is public knowledge—we refer to the measurement error covariances P N−1 (E), P N−1 (E) and \(\tilde{P}_{N-1}^{(E,P)}\). At the same time, the recorded measurements \(\overline{x}_{N-1}^{(P)}\) and \(\overline{x}_{N-1}^{(E)}\) are the private information of the respective players P and E: the “measurement” \(\overline{x}_{N-1}^{(E)}\) recorded by player E is his private information and is not shared with player P. Thus, player P has partial information. Similarly, the “measurement” \(\overline{x}_{N-1}^{(P)}\) recorded by player P is his private information and is not shared with player E, so also player E has partial information.
To gain a better appreciation of the informational issues in games with partial information, it is instructive to briefly digress and employ an “appealing” approach which is familiar to workers in deterministic control and which, unfortunately, is an approach sometimes employed in stochastic games. We now intentionally “take a wrong turn” which quickly leads us to a dead end. A correct analysis of the informational situation at hand follows.
Consider the following flawed argument: At time N − 1 the state information available to player P is \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\) and thus Player P calculates the expectation of his cost function
At the same time the state information available to player E is \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\) and Player E calculates the expectation of his payoff function
Now Player P’s optimization, that is, the differentiation of his deterministic cost function (8.33), yields the relationship
and Player E’s optimization, that is, the differentiation of his deterministic payoff function (8.34), yields the relationship
Have obtained two equations in the players’ optimal controls, namely, the two unknowns u N−1 ∗ and v N−1 ∗, which players P and E must separately solve in order to calculate their respective optimal controls. However player P cannot solve the set of two equations (8.35) and (8.36) because he does not know the “measurement” \(\overline{x}_{N-1}^{(E)}\) of E, and player E cannot solve this set of two equations because he does not know the “measurement” \(\overline{x}_{N-1}^{(P)}\) of P—both players have reached a dead end and it would appear that all that’s left to do is try to guess and outguess the opponent’s “measurement”. This state of affairs is caused by the players having partial information. This approach brings on the much maligned infinite regress in reciprocal reasoning! Unfortunately, this flawed approach is not foreign to the literature on dynamic stochastic games and it leads to erroneous “results”—see Aoki (1973) where, using this flawed argument, the LQGDG with a shared-control information pattern was “solved” and complicated “strategies” were computed.
We now change course and undertake a correct analysis of our LQGDG with a shared-control information pattern. To this end, it is imperative that one thinks in strategic terms. The strategies available to player P are mappings \(f: R^{n} \rightarrow R^{m_{u}}\) from his information set into his actions set; thus, the action of player P is \(u_{N-1} = f(\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\). Similarly, the strategies available to player E are mappings \(g: R^{n} \rightarrow R^{m_{v}}\) from his information set into his actions set—thus, the action of player E is \(v_{N-1} = g(\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\). However, we’ll show in the sequel that it suffices to consider P and E strategies of the form (8.9) and (8.10), respectively.
It is now important to realize that from player P’s vantage point, the action v N−1 of player E is a random variable. This is so because as far as player P is concerned the measurement \(\overline{x}_{N-1}^{(E)}\) of player E used in (8.10) to form his control v N−1 is a random variable. Similarly, from player E’s vantage point, the action u N−1 of player P is also a function of a random variable, \(\overline{x}_{N-1}^{(P)}\).
Consider the decision process of player P whose private information is \(\overline{x}_{N-1}^{(P)}\). He operates against the strategy g(⋅ ) of player E. Therefore, from player P’s perspective, the random variables at work are \(x_{N-1}\ and\ \overline{x}_{N-1}^{(E)}\). At decision time \(k = N - 1\) player P is confronted with a stochastic optimization problem and he calculates the expectation of the cost function (8.32), conditional on his private information \(\overline{x}_{N-1}^{(P)}\),
By correctly using in the calculation of his expected cost (8.37) player’s E \(strategy\ function\ g(\overline{x}_{N-1}^{(E)})\) rather than, as before, player E’s control v N−1, player P has eliminated the possibility of an infinite regress in reciprocal reasoning. This is so because P now has all the information to be able, in principle, to calculate the said expectation. Thus, player P calculates his expected cost
Player P calculates the expectations with respect to the random variable \(\overline{x}_{N-1}^{(E)}\) which features in Eq. (8.38), cognizant that it is \(\overline{x}_{N-1}^{(E)} \sim \mathcal{N}(\overline{x}_{N-1}^{(P)},P_{N-1}^{(E,P)})\). In this game with partial information, player P is using his measurement/private information \(\overline{x}_{N-1}^{(P)}\) and the public information to estimate the sufficient statistic \(\overline{x}_{N-1}^{(E)}\) of player E, which is the argument of E’s strategy function g(⋅ ); the latter features in player’s P cost functional (8.38) and thus enters the calculation of P’s cost.
The careful analysis of the optimization problem at hand leads to a Fredholm equation of the second kind of the convolution type with a kernel which is a Gaussian function; the unknown functions are the players’ optimal strategies. Taking the point of view of player E yields a similar Fredholm integral equation in the players’ optimal strategies. The solution of the set of two Fredholm equations yields the optimal strategies of players P and E. The optimal strategies turn out to be linear after all! The reader is referred to reference Pachter (2013) for the complete derivation.
8.5 Dynamic Programming
We consider the LQGDG (8.1)–(8.6) with a control-sharing information pattern as in Aoki (1973). The planning horizon N ≥ 2.
8.5.1 Sufficient Statistics
The initial state information and the measurements of players P and E are their private information but their past controls are shared information. Even though the players have partial information because the initial state information and their measurements are not shared, from the point of view of both players P and E, the control system is nevertheless Linear Gaussian (LG). This is so because at decision time k their respective adversary’s information state components v 0, …, v k−1 and u 0, …, u k−1 are not random variables with unknown p.d.f.s but are known to the players: The LQGDG with a control-sharing information pattern is LG and therefore the conditions for the P-player’s information state to be Gaussian hold and at decision time k the sufficient statistics of P and E are \(\overline{x}_{k}^{(P)}\) and \(\overline{x}_{k}^{(E)}\), respectively. Furthermore, as far as player P is concerned, at time k the sufficient statistic \(\overline{x}_{k}^{(E)}\) of player E is the random variable \(\overline{x}_{k}^{(E)} \sim \mathcal{N}(\overline{x}_{k}^{(P)},P_{k}^{(E,P)})\) and he uses this information in the calculation of his cost-to-go/value function at time k. Similarly, player E considers the sufficient statistic \(\overline{x}_{k}^{(P)}\) of player P to be \(\overline{x}_{k}^{(P)} \sim \mathcal{N}(\overline{x}_{k}^{(E)},P_{k}^{(E,P)})\) and player E uses this information in the calculation of his cost-to-go/value function at time k.
8.5.2 Analysis
The analysis is along the lines of the analysis of the static LQG game with partial information (Pachter 2013) and the analysis of the end game in Sect. 8.4 where \(k = N - 1\). We shall require
Proposition 1.
The value functions of players P and E are quadratic in their respective sufficient statistics \(\overline{x}_{k}^{(P)}\) and \(\overline{x}_{k}^{(E)}\) , that is
where
□
Similar to the correct approach outlined in Sect. 8.4 we calculate the value functions by taking the expectations over the relevant random variables.
where the random variable \(\tilde{w} \equiv e_{k}^{(P)} - e_{k}^{(E)} \sim \mathcal{N}(0,P_{k}^{(E,P)})\).
8.5.3 Optimization
Consider the minimization problem faced by P at decision time 0 ≤ k ≤ N − 2: Differentiating the RHS of Eq. (8.39) in his control u k he obtains the optimality condition
and similarly, upon differentiating the RHS of Eq. (8.40) in v k player E obtains
Player P has obtained an expression for his optimal control u k ∗ where \(\overline{x}_{k}^{(E)}\) does not feature and u k ∗ is a function of the parameter \(\overline{x}_{k}^{(P)}\) only. However, the strategy function γ k (E)(⋅ ) of player E features in this equation. Indeed, the strategic relationship holds
We have obtained an expression for P’s optimal strategy function \((\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)}))^{{\ast}}\) in terms of the strategy γ k (E)(⋅ ) of player E. Payer P obtained a linear relationship which directly ties together the as yet unknown optimal strategies \((\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)}))^{{\ast}}\) and \((\gamma _{k}^{(E)}(\overline{x}_{k}^{(E)}))^{{\ast}}\) of players P and E. Similarly, also player E obtains a linear relationship among the players’ optimal strategies:
Equations (8.43) and (8.44) constitute a linear system of Fredholm integral equations of the second kind in the players’ optimal strategies \((\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)}))^{{\ast}}\) and \((\gamma _{k}^{(E)}(\overline{x}_{k}^{(E)}))^{{\ast}}\). Similar to the analysis in reference Pachter (2013), the solution of the linear system of Fredholm integral equations of the second kind, Eqs. (8.43) and (8.44), yields the optimal strategies which are linear in the players’ sufficient statistics, namely
and the formulae for the optimal gains
The control system is Linear - Gaussian (LG) and therefore the players’ information states are Gaussian, time consistency in this dynamic game is provided by the application of the method of Dynamic Programming (DP) where the DP state is the information state, and, by construction, the strategies are Person-By-Person-Satisfactory (PBPS), so in the LQGDG with a control-sharing information pattern, a Nash equilibrium is obtained—as was also the case in the static LQG game with partial information (Pachter 2013).
8.5.4 Value Functions
The parameters which specify the statistics of the random variables in the LQGDG do not feature in the formulae (8.45) and (8.46) for the players’ optimal strategies and consequently an inspection of the DP equations (8.39) and (8.40) tells us that the matrices Π k won’t be a function of the said parameters; in other words, the matrices Π k are exclusively determined by the deterministic plant’s parameters A, B, C, Q, Q F , R c (P) and R c (E). Hence Π k = P k , where P k is the solution of the Riccati equation (8.57) derived for the deterministic LQDG discussed in the Appendix. Upon defining \(P_{k}:= P_{k} + Q\), the optimal gains correspond to the optimal gains in the deterministic LQDG, Eqs. (8.61) and (8.62) in the Appendix and the players’ optimal gains are
The recursions for the scalars c k (P) and c k (E) are obtained from the respective DP equations (8.39) and (8.40):
and for \(k = N - 1\) we use the end-game equation
Similarly,
and for \(k = N - 1\) we use the end-game equation
Remark.
Only the parameters A, B, C, Q, Q F , R c (P) and R c (E) feature in the Riccati equation for P k , as if the game would be the deterministic LQDG. The players’ measurement matrices, the process noise parameters and the measurement noise covariances do not feature in Eq. (8.57). However the solution P k of the Riccati equation (8.57) and the LQGDG’s measurements—related parameters H (P), H (E), the process noise parameters, R m (P) and R m (E), and the Kalman gains, all enter the recursions for the “intercepts” c (P) and c (E).
8.6 Main Result
The analysis of the LQGDG with a control-sharing information pattern is summarized in the following
Theorem 1.
Consider the LQGDG (8.1)–(8.6) with the information pattern:
-
1.
The P and E players’ prior information is given in Eqs. (8.2) and (8.4) , respectively. The prior information \(\overline{x}_{0}^{(P)}\) and \(\overline{x}_{0}^{(E)}\) is private information of the respective P and E players and it is not shared among the P and E players. The covariances P 0 (P) and P 0 (E) are finite and are public information.
-
2.
At decision time 1 ≤ k ≤ N − 1 the measurements of player P and player E are z k (P) and z k (E) and their measurement equations are Eqs. (8.3) and (8.5) , respectively. At decision time 1 ≤ k ≤ N − 1 the respective measurement records \(Z_{k}^{(P)} =\{ z_{1}^{(P)},\ldots,z_{k}^{(P)}\}\) and \(Z_{k}^{(E)} =\{ z_{1}^{(E)},\ldots,z_{k}^{(E)}\}\) are the private information of players P and E and the measurements are not shared among the P and E players.
-
3.
At decision time \(k = 1,\ldots,N - 1\) the P and E players have complete recall of their respective ownship control histories \(U_{k} =\{ u_{0},\ldots,u_{k-1}\}\) and \(V _{k} =\{ v_{0},\ldots,v_{k-1}\}\) .
-
4.
The players observe their opponent’s moves: at decision time 1 ≤ k ≤ N − 1 the control history \(V _{k} =\{ v_{0},\ldots,v_{k-1}\}\) of player E is known to player P and, similarly, player E knows the control history \(U_{k} =\{ u_{0},\ldots,u_{k-1}\}\) of player P.
The players obtain their private state estimates \(\overline{x}_{k}^{(P)}\) and \(\overline{x}_{k}^{(E)}\) by running two separate Kalman Filters (KFs) in parallel driven by their private prior information and their separate measurements: Player P initialized his KF (8.11)–(8.15) with his prior information \((\overline{x}_{0}^{(P)},P_{0}^{(P)})\) and uses his measurements z k (P). Similarly, player E initialized his KF (8.16)–(8.20) with his prior information \((\overline{x}_{0}^{(E)},P_{0}^{(E)})\) and uses his measurements z k (E). Both players use the shared complete input history.
The players reuse the state feedback optimal strategies derived for the deterministic LQDG as provided by Theorem A1: In Eq. (8.61) player P sets \(x_{k}:= \overline{x}_{k}^{(P)}\) and in Eq. (8.62) player E sets \(x_{k}:= \overline{x}_{k}^{(E)}\).
A Nash equilibrium for the “zero-sum” LQGDG with a control-sharing information pattern is established. The value functions of players P and E are
where the matrices P k are the solution of the Riccati equation (8.57). The “intercepts” c k (P) and c k (E) are obtained by solving the respective scalar recursions [(8.49), (8.50), (8.48)] and [(8.51), (8.52), (8.47)]. The covariance matrices P k (P), P k (E) and \(\tilde{P}_{k}^{(E,P)}\) exclusively feature in the intercepts’ recursions. The matrices \(\tilde{P}_{k}^{(E,P)}\) are given by the solution of the Lyapunov-like linear matrix equation (8.29). The control Riccati equation (8.57), the KF Riccati equation (8.12), (8.13), (8.15) of player P, the KF Riccati equation (8.17), (8.18), (8.20) of player E, and the Lyapunov-like linear matrix equation (8.29) can all be solved ahead of time and off line. Once the three Riccati equations and the Lyapunov equation have been solved, the value functions’ “intercepts” c k (P) and c k (E) are also obtained off line. □
8.7 Conclusion
Linear-Quadratic Gaussian Dynamic Games with a control-sharing information pattern have been considered. The players’ initial state information and their measurements are private information, but each player is able to observe his antagonist’s past inputs: the protagonists’ past controls is shared information. Although this is a game with partial information, the control-sharing information pattern renders the game amenable to solution by the method of DP and a Nash equilibrium for the “zero-sum” LQGDG is established. The attendant optimal strategies of the LQGDG with a control-sharing information pattern are linear and certainty equivalence holds. The linearity of the optimal strategies has not been artificially imposed from the outset but follows from the LQG nature of the optimization problem at hand, courtesy of the control-sharing information pattern. The correct solution of LQGDGs with a control-sharing information pattern is obtained in closed-form.
References
Aoki M (1973) On decentralized linear stochastic control. IEEE Trans Autom Control 18(3): 243–250
Basar T, Bernhard P (2008) \(H^{\infty }\) - optimal control and related minimax design problems: a dynamic game approach. Birkhauser, Boston
Basar T, Olsder GJ (1995) Dynamic non-cooperative game theory. Academic, New York
Engwerda J (2005) LQ dynamic optimization and differential games. Wiley, New York
Fuzhen Z (2005) The Schur complement and its applications. Springer, Berlin
Ho YC, Bryson AE, Baron S (1965) Differential games and optimal pursuit-evasion strategies. IEEE Trans Autom Control 10(4):385–389
Kurtaran BZ, Sivan R (1974) Linear-quadratic-Gaussian control with one-step-delay sharing pattern. IEEE Trans Autom Control 13(5):571–574
Pachter M (2013) Static linear-quadratic Gaussian games. In: Krivan V, Zaccour G (eds) Advances in dynamic games. Birkhauser, Boston, pp 85–105
Pachter M, Pham K (2010) Discrete-time linear-quadratic dynamic games. J Optim Theory Appl 146(1):151–179
Pachter M, Pham K (2013) Static teams and stochastic games. In: Pardalos PM, et al (eds) Dynamics of information systems: algorithmic approaches. Springer, New York, pp 147–176
Pachter M, Pham K (2014) Informational issues in decentralized control. In: Vogiatzis C, et al (eds) Dynamics of information systems: computational and mathematical challenges. Springer, New York, pp 45–76
Radner R (1962) Static teams and stochastic games. Ann Math Stat 33(3):857–867
Sandell NR, Athans M (1974) Solution of some nonclassical LQG stochastic decision problems. IEEE Trans Autom Control 19(2):108–116
Witsenhausen HS (1968) A counterexample in stochastic optimal control. SIAM J Control 6(1):131–147
Witsenhausen HS (1971) Separation of estimation and control for discrete time systems. Proc IEEE 59(11):1557–1566
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Linear-Quadratic Dynamic Game
Appendix: Linear-Quadratic Dynamic Game
The solution of Linear-Quadratic Dynamic Games (LQDG) with perfect information, a.k.a., deterministic LQDGs, was derived in Pachter and Pham (2010, Theorem 2.1). The Schur complement concept (Fuzhen 2005) was used to invert a blocked \((m_{u} + m_{v}) \times (m_{u} + m_{v})\) matrix which contains four blocks, its two diagonal blocks being a m u × m u matrix and a m v × m v matrix. We further improve on the results of Pachter and Pham (2010) by noting that a matrix with four blocks has two Schur complements, say S B and S C . This allows one to obtain explicit and symmetric formulae for the P and E players’ optimal strategies, thus yielding the complete solution of the deterministic LQDG. These results are used in this paper and for the sake of completeness, the closed form solution of the perfect information/deterministic zero-sum LQDG is included herein.
The linear dynamics are
Payer P is the minimizer and his control \(u_{k} \in R^{m_{u}}\). Player E is the maximizer and his control \(v_{k} \in R^{m_{v}}\). The planning horizon is N. The cost/payoff functional is quadratic:
and Q and Q F are real symmetric matrices. The players’ control effort weighting matrices R u and R v are typically real symmetric and positive definite. Oftentimes it is stipulated that also the state penalty matrices Q and Q F be positive definite, or, at least, positive semi-definite; these assumptions can be relaxed. The following holds.
Theorem A1.
A necessary and sufficient condition for the existence of a solution to the deterministic zero-sum LQDG (8.53) and (8.54) is
and
\(\forall \ k = 1,\ldots,N - 1\) , where the real, symmetric matrices P k are the solution of the Riccati difference equation
In Eq. (8.57), the first Schur complement matrix function
In addition, the problem’s parameters must satisfy the conditions
and
The value of the LQDG is
The players’ optimal strategies are the linear state feedback control laws
In Eq. (8.62) the second Schur complement matrix function
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Pachter, M. (2016). Linear-Quadratic Gaussian Dynamic Games with a Control-Sharing Information Pattern. In: Thuijsman, F., Wagener, F. (eds) Advances in Dynamic and Evolutionary Games. Annals of the International Society of Dynamic Games, vol 14. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-28014-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-28014-1_8
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-28012-7
Online ISBN: 978-3-319-28014-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)