Keywords

MSC Codes:

MSC Codes: 91A25, 93C41, 49N70

8.1 Introduction

The complete solution of Linear-Quadratic Gaussian Dynamic Games (LQGDGs) has been a longstanding goal of the controls and games communities. That LQGDGs with a nonclassical information pattern can be problematic has been amply illustrated in Witsenhausen’s seminal paper (Witsenhausen 1968)—see also Pachter and Pham (2014). Control theorists have traditionally emphasized control theoretic aspects and the backward induction/dynamic programming solution method, which however is not applicable to dynamic games with partial information—one notable exception notwithstanding, being the game with partial information that will be discussed herein. And game theorists have focused on information economics, that is, the role of information in games, but for the most part, discrete games. The state of affairs concerning dynamic games with partial information is not satisfactory. In this respect, the situation is not much different now than it was in 1971 when Witsenhausen made a similar observation (Witsenhausen 1971). In this article a careful analysis of dynamic games with partial information is undertaken. We exclusively focus on LQGDGs, which are more readily amenable to analysis. Indeed, Linear-Quadratic Dynamic Games (LQDGs) with perfect information stand out as far as applications of the theory of dynamic games are concerned: a canonical instance of an application of the theory of LQDGs can be found in Ho et al. (1965) where it has been shown that its solution yields the Proportional Navigation (PN) guidance law which is universally used in Air-to-Air missiles. Furthermore, the theory of LQDGs has been successfully applied to the synthesis of \(H_{\infty }\) control laws (Basar and Bernhard 2008). The theory of LQDGs with perfect information has therefore received a great deal of attention (Basar and Olsder 1995; Engwerda 2005; Pachter and Pham 2010). In these works, the concepts of state, and state feedback, are emphasized and the solution method entails backward induction, a.k.a., Dynamic Programming (DP).

Concerning informational issues in LQGDGs: In previous work Radner (1962) and Pachter and Pham (2013) a static Linear-Quadratic Gaussian (LQG) team problem was addressed and a static “zero-sum” LQG game with partial information was analyzed in Pachter (2013). In this article a dynamic “zero-sum” LQG game, that is, a LQGDG, where the players have partial information, is addressed. The information pattern is as follows. The players’ initial state information and their measurements are private information, but each player is able to observe the antagonist’s past inputs: the protagonists’ past controls is shared information. This information pattern has previously been discussed by Aoki (1973), and in the context of a team decision problem, this information pattern has also been discussed in Sandell and Athans (1974). However, Aoki (1973) took “a wrong turn”: as so often happens in the literature of games with partial information, one is tempted to assume the players will try to second guess the opponents’ private information, say, their measurements. The vicious cycle of second guessing the opponent’s measurements leads to a mirror gallery like setting and to a dead end. This point is discussed in Sect. 8.4. Concerning reference Sandell and Athans (1974) where a decentralized dynamic team problem with a control-sharing information pattern is considered: It is argued that an infinite amount of information is contained in a real number, which, in theory, is correct. And since the control information is shared, then at least in a cooperative control/team setting, a player/agent could in principle encode in the controls information about to be sent to his partner his private information, for example, his measurements history. This is due to the fact that the controls which are about to be communicated can be modified slightly to encode the measurements information of the protagonists without significantly disturbing the control, and consequently, have a barely noticeable effect on the value of the game. One then falls back on the solution of the LQG cooperative control problem with a one-step-delay shared information pattern (Kurtaran and Sivan 1974). However, this scheme has no place in an antagonistic scenario, a.k.a., “zero-sum” LQGDG as discussed in our paper, and also does not properly model a decentralized control scenario. Moreover, this scheme totally depends on the players’ ability of obtaining a noiseless observation of the broadcast control and as such, exhibits a lack of robustness to measurement error and is not a viable proposition. In the end, it is acknowledged in Sandell and Athans (1974) that the control-sharing information pattern leads to a stochastic control problem that is ill posed and it is stated that “the need for future work on this problem is obvious”. Unfortunately, the analysis of LQGDGs with a control-sharing information pattern presented in Aoki (1973) and Sandell and Athans (1974) is patently incorrect. In this paper LQGDGs with a control-sharing information pattern are revisited. A careful analysis reveals that although this is a game with partial information, the control-sharing information pattern renders the game amenable to solution by the method of DP. It is shown that the solution of the LQGDG with a control-sharing information pattern is similar in structure to the solution of the LQG optimal control problem in so far as the principle of certainty equivalence/decomposition holds. A correct closed-form solution of a LQGDG with a control-sharing information pattern is obtained.

The paper is organized as follows. The LQGDG problem statement and the attendant control-sharing information pattern are presented in Sect. 8.2. The state estimation algorithm required for the solution of LQGDGs with a control-sharing information pattern is developed in Sect. 8.3. The analysis of Linear-Quadratic Gaussian Games with a control-sharing information pattern is anchored in Sect. 8.4 where the end-game is solved and the solution of the LQGDG with a control-sharing information pattern is obtained in Sect. 8.5 using the method of backward induction/DP. The results are summarized in Sect. 8.6, followed by concluding remarks in Sect. 8.7. For the sake of completeness, the solution of the baseline deterministic LQDG game with perfect information (Pachter and Pham 2010) is included in the Appendix. The somewhat lengthy exposition could perhaps be excused in light of Witsenhausen’s observation when discussing LQG control (Witsenhausen 1971): “The most confused derivations of the correct results are also among the shortest”.

8.2 Linear Quadratic Gaussian Dynamic Game

Two player Linear Quadratic Gaussian Dynamic Games (LQGDGs) are considered. The players are designated P and E, and the game is specified as follows.

  • Dynamics: Linear

    $$\displaystyle\begin{array}{rcl} x_{k+1} = A_{k}x_{k} + B_{k}u_{k} + C_{k}v_{k} +\varGamma _{k}w_{k},\ x_{0} \equiv x_{0},\ \ k = 0,\ldots,N - 1& & {}\end{array}$$
    (8.1)

At decision time k the controls of players P and E are u k and v k , respectively. The process noise \(w_{k} \sim \mathcal{N}(0,Q_{p}),\ \ k = 0,\ldots,N - 1\). The planning horizon is N.

  • Measurements: Linear

The N measurements of player P are:

At time k = 0, \(\overline{x}_{0}^{(P)}\)— player P believes that the initial state

$$\displaystyle\begin{array}{rcl} x_{0} \sim \mathcal{N}(\overline{x}_{0}^{(P)},P_{ 0}^{(P)}),& &{}\end{array}$$
(8.2)

and thereafter he takes the measurements

$$\displaystyle\begin{array}{rcl} z_{k+1}^{(P)} = H_{ k+1}^{(P)}x_{ k+1} + v_{k+1}^{(P)},\ \ v_{ k+1}^{(P)} \sim \mathcal{N}(0,R_{ m}^{(P)}),\ k = 0,\ldots,N - 2& &{}\end{array}$$
(8.3)

The N measurements of player E are:

At time k = 0, \(\overline{x}_{0}^{(E)}\)— player E believes that the initial state

$$\displaystyle\begin{array}{rcl} x_{0} \sim \mathcal{N}(\overline{x}_{0}^{E)},P_{ 0}^{(E)}),& &{}\end{array}$$
(8.4)

and thereafter he takes the measurements

$$\displaystyle\begin{array}{rcl} z_{k+1}^{(E)} = H_{ k+1}^{(E)}x_{ k+1} + v_{k+1}^{(E)},\ \ v_{ k+1}^{(E)} \sim \mathcal{N}(0,R_{ m}^{(E)}),\ k = 0,\ldots,N - 2& &{}\end{array}$$
(8.5)
  • Cost/Payoff Function: Quadratic

We confine our attention to the antagonistic “zero-sum” game scenario where the respective P and E players strive to minimize and maximize the cost/payoff function

$$\displaystyle\begin{array}{rcl} J = x_{N}^{T}Q_{ F}x_{N} +\sum _{ k=0}^{N-1}[x_{ k+1}^{T}Q_{ k+1}x_{k+1} + u_{k}^{T}R_{ k}^{(P)}u_{ k} - v_{k}^{T}R_{ k}^{(E)}v_{ k}] \rightarrow min_{\{u_{k}\}_{k=0}^{N-1}}max_{\{v_{k}\}_{k=0}^{N-1}}& &{}\end{array}$$
(8.6)

Specifically, players P and E minimize and maximize their expected cost/payoff E(J∣⋅ ), conditional on their private information. The expectation operator is liberally used in the dynamics game literature but oftentimes it is not clearly stated with respect to which random variables the expectation is calculated and on which random variables the expectation is conditional. This tends to mask the fact that what appear to be “zero-sum” games are in fact nonzero-sum games. Upon considering “zero-sum” games with partial information, the illusion is then created that a zero-sum game is considered. One then tends to rely on the uniqueness of the saddle point value and the interchangeability of non-unique optimal saddle point strategies in zero-sum games. This argument is flawed because, as previously discussed, in “zero-sum” games with partial information the P and E players calculate their respective cost and payoff conditional on their private information, as is correctly done in this paper; that’s why I put the term zero-sum in quotation marks. Thus, although high powered mathematics is oftentimes used, serious conceptual errors make the “results” not applicable. Contrary to statements sometimes encountered in the literature, in “zero-sum” games with partial information one cannot look for a saddle point solution and the correct solution concept is a Nash equilibrium, that is, Person by Person Satisfactory (PBPS) solution. In this paper a unique Nash equilibrium is provided and the P and E players’ value functions are calculated.

  • Information pattern

    1. 1.

      Public information

      1. (a)

        Problem parameters: A k , B k , C k , H k (P), H k (E), Q p , Q k , Q F , R k (P), R k (E), R m (P), R m (E).

      2. (b)

        Prior information: P 0 (P), P 0 (E).

    2. 2.

      Private information

      • At decision time k = 0 the prior information of player P is \(\overline{x } _{0 }^{(P)}\).

      • At decision time k = 0 the prior information of player E is \(\overline{x } _{0 }^{(E)}\).

      • At decision time 1 ≤ k ≤ N − 1 the information of player P are his measurements \(\overline{x } _{0 }^{(P)}\), z 1 (P), , z k (P) and ownship control history u 0, , u k−1.

      • At decision time 1 ≤ k ≤ N − 1 the information of player E are his measurements \(\overline{x } _{0 }^{(E)}\), z 1 (E), , z k (E) and ownship control history v 0, , v k−1.

  • Sufficient statistics

    • The sufficient statistics of player P at decision time k = 0: \(x_{0} \sim \mathcal{N}(\overline{x}_{0}^{(P)},P_{0}^{(P)})\).

    • The sufficient statistics of player E at decision time k = 0: \(x_{0} \sim \mathcal{N}\) \((\overline{x}_{0}^{(E)},P_{0}^{(E)})\).

    • The sufficient statistics of player P at decision time 1 ≤ k ≤ N − 1: The p.d.f. f k (P)(⋅ ) of the physical state x k , as calculated by player P using his private information.

    • The sufficient statistics of player E at decision time 1 ≤ k ≤ N − 1: The p.d.f. f k (E)(⋅ ) of the physical state x k , as calculated by player E using his private information.

Remark.

In the static LQGDG (Pachter 2013) where N = 1 the respective sufficient statistics of P and E are \(\overline{x}_{0}^{(P)}\) and \(\overline{x}_{0}^{(E)}\).

8.2.1 Problem Statement

The LQGDG (8.1)–(8.6) is considered and it is assumed that the players’ information sets are augmented as follows: At decision time k, \(k = 1,\ldots,N - 1\), player P is endowed with the additional information regarding the control history v 0, , v k−1 of player E. Thus, player P observed the past inputs v 0, , v k−1 of player E. Similarly, at decision time k, \(k = 1,\ldots,N - 1\), player E is endowed with the additional information regarding the control history u 0, , u k−1 of player P. Thus, player E observed the past inputs u 0, , u k−1 of player P.

The information pattern considered herein is referred to as the control-sharing information pattern. The dynamics and the measurement equations are linear, and the cost/payoff function is quadratic, but the information pattern is not classical. Strictly speaking, the information pattern is not partially nested because E’s measurements, which he used to form his controls, are not known to P, and vice versa, P’s measurements, which he used to form his controls, are not known to E. However, this is now a moot point because the information pattern is s.t. the control history of player E is known to player P, and vice versa, the control history of player P is known to player E. This, and the fact that player P and player E, each separately, perceive the initial state x 0 to be Gaussian, causes the state estimation problem faced by the players at decision time k to be Linear and Gaussian (LG). Hence, at decision time k, the knowledge of the complete control history \(u_{0},v_{0},\ldots,u_{k-1},v_{k-1}\) and their private measurement records makes it possible for both players to separately apply the linear Kalman filtering algorithm: based on his private measurements record, each player runs a Kalman filter using his measurements and the complete input history, and separately obtains an estimate of the state x k —strictly speaking, the p.d.f. of x k is separately obtained by each player. Thus, players P and E perceive the current state x k to be Gaussian distributed. Having run their respective Kalman filters, at time k player P believes that the state

$$\displaystyle\begin{array}{rcl} x_{k} \sim \mathcal{N}(\overline{x}_{k}^{(P)},P_{ k}^{(P)})\,\ \ \ \forall \ k,\ \ N - 1 \geq k \geq 0& &{}\end{array}$$
(8.7)

and player E believes that the state

$$\displaystyle\begin{array}{rcl} x_{k} \sim \mathcal{N}(\overline{x}_{k}^{(E)},P_{ k}^{(E)})\,\ \ \ \forall \ k,\ \ N - 1 \geq k \geq 0& &{}\end{array}$$
(8.8)

but they are also aware that their state estimates are correlated—see Sect. 8.3.

Since the LQGDG (8.1)–(8.6) is LG, the P and E players’ separately calculated sufficient statistics are given by Eqs. (8.7) and (8.8), and their controls will be determined by their optimal strategies according to \(u_{k}^{{\ast}} = (\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)},P_{k}^{(P)}))^{{\ast}}\) and \(v_{k}^{{\ast}} = (\gamma _{k}^{(E)}(\overline{x}_{k}^{(E)},P_{k}^{(E)}))^{{\ast}}\). In fact, we shall show that in LQGDGs with a control-sharing information pattern the optimal strategies are of the form

$$\displaystyle\begin{array}{rcl} u_{k}^{{\ast}}& =& (\gamma _{ k}^{(P)}(\overline{x}_{ k}^{(P)}))^{{\ast}}\,{}\end{array}$$
(8.9)
$$\displaystyle\begin{array}{rcl} v_{k}^{{\ast}}& =& (\gamma _{ k}^{(E)}(\overline{x}_{ k}^{(E)}))^{{\ast}}\,\ \ \ \ \forall \ k,\ \ 0 \leq k \leq N - 1{}\end{array}$$
(8.10)

and are linear.

8.3 Kalman Filtering

LQGDGs with a control-sharing information pattern are Linear-Gaussian (LG) and consequently at decision time k each player can separately calculate his estimate of the physical state x k using a linear Kalman Filter (KF). Player P runs the KF

$$\displaystyle\begin{array}{rcl} (\overline{x}_{k}^{(P)})^{-}& =& A\overline{x}_{ k-1}^{(P)} + Bu_{ k-1} + Cv_{k-1},\ \ \overline{x}_{0}^{(P)} \equiv \overline{x}_{ 0}^{(P)}{}\end{array}$$
(8.11)
$$\displaystyle\begin{array}{rcl} (P_{k}^{(P)})^{-}& =& AP_{ k-1}^{(P)}A^{T} +\varGamma Q_{ p}\varGamma ^{T},\ \ P_{ 0}^{(P)} \equiv P_{ 0}^{(P)}{}\end{array}$$
(8.12)
$$\displaystyle\begin{array}{rcl} K_{k}^{(P)}& =& (P_{ k}^{(P)})^{-}(H^{(P)})^{T}[H^{(P)}(P_{ k}^{(P)})^{-}(H^{(P)})^{T} + R_{ m}^{(P)}]^{-1}{}\end{array}$$
(8.13)
$$\displaystyle\begin{array}{rcl} \overline{x}_{k}^{(P)}& =& (\overline{x}_{ k}^{(P)})^{-} + K_{ k}^{(P)}[z_{ k}^{(P)} - H^{(P)}(\overline{x}_{ k}^{(P)})^{-}]{}\end{array}$$
(8.14)
$$\displaystyle\begin{array}{rcl} P_{k}^{(P)}& =& (I - K_{ k}^{(P)}H^{(P)})(P_{ k}^{(P)})^{-}{}\end{array}$$
(8.15)

and so at decision time k player P obtains his estimate \(\overline{x}_{k}^{(P)}\) of the state x k . Similarly, player E runs the KF

$$\displaystyle\begin{array}{rcl} (\overline{x}_{k}^{(E)})^{-}& =& A\overline{x}_{ k-1}^{(E)} + Bu_{ k-1} + Cv_{k-1},\ \ \overline{x}_{0}^{(E)} \equiv \overline{x}_{ 0}^{(E)}{}\end{array}$$
(8.16)
$$\displaystyle\begin{array}{rcl} (P_{k}^{(E)})^{-}& =& AP_{ k-1}^{(E)}A^{T} +\varGamma Q_{ p}\varGamma ^{T},\ \ P_{ 0}^{(E)} \equiv P_{ 0}^{(E)}{}\end{array}$$
(8.17)
$$\displaystyle\begin{array}{rcl} K_{k}^{(E)}& =& (P_{ k}^{(E)})^{-}(H^{(E)})^{T}[H^{(E)}(P_{ k}^{(E)})^{-}(H^{(E)})^{T} + R_{ m}^{(E)}]^{-1}{}\end{array}$$
(8.18)
$$\displaystyle\begin{array}{rcl} \overline{x}_{k}^{(E)}& =& (\overline{x}_{ k}^{(E)})^{-} + K_{ k}^{(E)}[z_{ k}^{(E)} - H^{(E)}(\overline{x}_{ k}^{(E)})^{-}]{}\end{array}$$
(8.19)
$$\displaystyle\begin{array}{rcl} P_{k}^{(E)}& =& (I - K_{ k}^{(E)}H^{(E)})(P_{ k}^{(E)})^{-}{}\end{array}$$
(8.20)

and so at decision time k player E obtains his estimate \(\overline{x}_{k}^{(E)}\) of the state x k . The P and E players can calculate their respective state estimation error covariances and Kalman gains P k (P), K k+1 (P), P k (E) and K k+1 (E) ahead of time and off line.

In LQGDGs with a control-sharing information pattern the players’ sufficient statistic is their state estimate; the latter is the argument of their strategy functions (8.9) and (8.10). Hence, in the process of countering E’s action, P must compute the statistics of E’s state estimate \(\overline{x}_{k}^{(E)}\), and, vice versa, while planning his move, E must compute the statistics of P’s state estimate \(\overline{x}_{k}^{(P)}\). Momentarily assume the point of view of player P: As far as P is concerned, the unknown to him state estimate of player E at time k, \(\overline{x}_{k}^{(E)}\), is a random variable (and consequently E’s input at time k is a random variable). Similarly, player E will consider the unknown to him state estimate of player P at time k, \(\overline{x}_{k}^{(P)}\), to be a random variable (and consequently P’s input at time k is a random variable). Hence, in the LQGDG with a control-sharing information pattern, at time k player P will estimate E’s state estimate \(\overline{x}_{k}^{(E)}\) using his calculated ownship state estimate \(\overline{x}_{k}^{(P)}\), and vice versa, player E will estimate P’s state estimate \(\overline{x}_{k}^{(P)}\) using his calculated ownship state estimate \(\overline{x}_{k}^{(E)}\). Thus, in the LQGDG with a control-sharing information pattern and with his state estimate \(\overline{x}_{k}^{(P)}\) at time k in hand, player P calculates the statistics of E’s state estimate \(\overline{x}_{k}^{(E)}\), conditional on the public and private information available to him at time k. Similarly, having obtained at time k his state estimate \(\overline{x}_{k}^{(E)}\), player E calculates the statistics of the state estimate \(\overline{x}_{k}^{(P)}\) of player P, conditional on the public and private information available to him at time k. Let’s start at decision time k = 0.

Player P models his measurement/estimate \(\overline{x}_{0}^{(P)}\) of the initial state x 0 as

$$\displaystyle\begin{array}{rcl} \overline{x}_{0}^{(P)} = x_{ 0} + e_{0}^{(P)}\,& &{}\end{array}$$
(8.21)

where x 0 is the true physical state and e 0 (P) is player P’s measurement/estimation error, whose statistics, in view of Eq. (8.2), are \(e_{0}^{(P)} \sim \mathcal{N}(0,P_{0}^{(P)})\). In addition, player P models player E’s measurement \(\overline{x}_{0}^{(E)}\) of the initial state x 0 as

$$\displaystyle\begin{array}{rcl} \overline{x}_{0}^{(E)} = x_{ 0} + e_{0}^{(E)}\,& &{}\end{array}$$
(8.22)

where, as before, x 0 is the true physical state and e 0 (E) is player E’s measurement/estimation error, whose statistics, which are known to P—see Eq. (8.4)—are \(e_{0}^{(E)} \sim \mathcal{N}(0,P_{0}^{(E)})\). The Gaussian random variables e 0 (P) and e 0 (E) are independent—by hypothesis. From player P’s point of view, \(\overline{x}_{0}^{(P)}\) is known but \(\overline{x}_{0}^{(E)}\) is a random variable. Subtracting Eq. (8.21) from Eq. (8.22), at time k = 0 player P concludes that as far as he is concerned, player E’s measurement upon which he will decide, according to Eq. (8.10), on his optimal control v 0 , is the random variable

$$\displaystyle\begin{array}{rcl} \overline{x}_{0}^{(E)} = \overline{x}_{ 0}^{(P)} + e_{ 0}^{(E)} - e_{ 0}^{(P)}\,& &{}\end{array}$$
(8.23)

In other words, as far as P is concerned, E’s estimate \(\overline{x}_{0}^{(E)}\) of the initial state x 0 is the Gaussian random variable

$$\displaystyle\begin{array}{rcl} \overline{x}_{0}^{(E)} \sim \mathcal{N}(\overline{x}_{ 0}^{(P)},P_{ 0}^{(P)} + P_{ 0}^{(E)})& &{}\end{array}$$
(8.24)

Thus, player P has used his measurement/private information \(\overline{x}_{0}^{(P)}\) and the public information P 0 (P) and P 0 (E) to calculate the statistics of the sufficient statistic \(\overline{x}_{0}^{(E)}\) of player E, which is the argument of E’s strategy function γ 0 (E)(⋅ ); the latter, along with P’s control u 0, will feature in player’s P cost functional. Similarly, as far as player E is concerned, at time k = 0 the statistics of the sufficient statistic \(\overline{x}_{0}^{(P)}\) of player P are

$$\displaystyle\begin{array}{rcl} \overline{x}_{0}^{(P)} \sim \mathcal{N}(\overline{x}_{ 0}^{(E)},P_{ 0}^{(P)} + P_{ 0}^{(E)})& &{}\end{array}$$
(8.25)

Similar to the case where k = 0, as far as player P is concerned the state estimate of player E at decision time k ≥ 1 is the random variable

$$\displaystyle\begin{array}{rcl} \overline{x}_{k}^{(E)} = \overline{x}_{ k}^{(P)} + e_{ k}^{(E)} - e_{ k}^{(P)},& & {}\\ \end{array}$$

that is, at decision time k player P believes that the state estimate \(\overline{x}_{k}^{(E)}\) of player E is

$$\displaystyle\begin{array}{rcl} \overline{x}_{k}^{(E)} \sim \mathcal{N}(\overline{x}_{ k}^{(P)},P_{ k}^{(E,P)})& &{}\end{array}$$
(8.26)

where the covariance matrix

$$\displaystyle\begin{array}{rcl} P_{k}^{(E,P)}& \equiv & E(\ (e_{ k}^{(E)} - e_{ k}^{(P)})(e_{ k}^{(E)} - e_{ k}^{(P)})^{T}\ ) {}\\ & =& P_{k}^{(P)} + P_{ k}^{(E)} - E(\ e_{ k}^{(P)}(e_{ k}^{(E)})^{T}\ ) - (E(\ e_{ k}^{(P)}(e_{ k}^{(E)})^{T}\ ))^{T} {}\\ \end{array}$$

At the decision time instants \(k = 1,\ldots,N - 1\) the P and E players’ respective state estimation errors e k (P) and e k (E) are now correlated—this is caused by the process dynamics being driven in part by process noise.

Similarly, as far as he is concerned, player E believes that at decision time k the state estimate \(\overline{x}_{k}^{(P)}\) of player P is the random variable

$$\displaystyle\begin{array}{rcl} \overline{x}_{k}^{(P)} \sim \mathcal{N}(\overline{x}_{ k}^{(E)},P_{ k}^{(E,P)})& &{}\end{array}$$
(8.27)

Concerning decision time k ≥ 1: Let the covariance matrix

$$\displaystyle\begin{array}{rcl} \tilde{P}_{k}^{(E,P)} \equiv E(\ e_{ k}^{(P)}(e_{ k}^{(E)})^{T}\ )& &{}\end{array}$$
(8.28)

It can be shown that the recursion for the correlation matrix \(\tilde{P}_{k}^{(E,P)}\) is

$$\displaystyle\begin{array}{rcl} & & \tilde{P}_{k+1}^{(E,P)} = (I - K_{ k+1}^{(P)}H^{(P)})(A\tilde{P}_{ k}^{(P,E)}A^{T} +\varGamma Q_{ p}\varGamma ^{T})(I - K_{ k+1}^{(E)}H^{(E)})^{T},\ \tilde{P}_{ 0}^{(P,E)} = 0, \\ & & \phantom{vinovvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv}k = 0,\ldots,N - 1 {}\end{array}$$
(8.29)

In summary, at decision time \(k = 0,\ldots,N - 1\) player P believes that the statistics of E’s estimate \(\overline{x}_{k}^{(E)}\) of the state x k are given by Eq. (8.26) and player E believes that the statistics of P’s estimate \(\overline{x}_{k}^{(P)}\) of the state x k are given by Eq. (8.27) where

$$\displaystyle\begin{array}{rcl} P_{k}^{(E,P)} = P_{ k}^{(P)} + P_{ k}^{(E)} -\tilde{ P}_{ k}^{(E,P)} - (\tilde{P}_{ k}^{(E,P)})^{T}& & {}\\ \end{array}$$

The KF covariance matrices P k (P), P k (E) and \(\tilde{P}_{k}^{(E,P)}\) are calculated ahead of time by solving the respective recursions (8.12), (8.13), (8.15); (8.17), (8.18), (8.20); and (8.29).

Finally, since in LQGDGs with a control-sharing information pattern the sufficient statistic is the players’ state estimate, then upon employing the method of Dynamic Programming, at decision time k player P must project ahead the estimate of the physical state x k+1 that the Kalman filtering algorithm will provide at time k + 1. It can be shown that at time k player P believes that the future state x k+1 at time k + 1 will be the Gaussian random variable

$$\displaystyle\begin{array}{rcl} & & \overline{x}_{k+1}^{(P)} = A\overline{x}_{ k}^{(P)} + Bu_{ k} + C\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} + e_{ k}^{(P)} - e_{ k}^{(E)}) + K_{ k+1}^{(P)}(H^{(P)}\varGamma w_{ k} \\ & & \phantom{vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv} + v_{k+1}^{(P)} - H^{(P)}Ae_{ k}^{(P)}){}\end{array}$$
(8.30)

Similarly, at decision time k player E’s estimate of the state x k+1 at time k + 1 will be the Gaussian random variable

$$\displaystyle\begin{array}{rcl} & & \overline{x}_{k+1}^{(E)} = A\overline{x}_{ k}^{(E)} + B\gamma _{ k}^{(P)}(\overline{x}_{ k}^{(E)} + e_{ k}^{(E)} - e_{ k}^{(P)}) + Cv_{ k} + K_{k+1}^{(E)}(H^{(E)}\varGamma w_{ k} \\ & & \phantom{vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv} + v_{k+1}^{(E)} - H^{(E)}Ae_{ k}^{(E)}){}\end{array}$$
(8.31)

8.4 End Game

In the best tradition of backward induction/Dynamic Programming, the terminal stage of the game, namely, the players’ decision time \(k = N - 1\) is analyzed first. In the end game the cost/payoff function is

$$\displaystyle\begin{array}{rcl} J_{N-1}(u_{N-1},v_{N-1};x_{N-1})& =& x_{N}^{T}Q_{ F}x_{N} + x_{N}^{T}Qx_{ N} + u_{N-1}^{T}R_{ u}u_{N-1} - v_{N-1}^{T}R_{ v}v_{N-1} {}\\ & =& x_{N}^{T}(Q + Q_{ F})x_{N} + u_{N-1}^{T}R_{ u}u_{N-1} - v_{N-1}^{T}R_{ v}v_{N-1} {}\\ \end{array}$$

It is convenient to momentarily set \(Q_{F}:= Q + Q_{F}\) whereupon the terminal cost/payoff

$$\displaystyle\begin{array}{rcl} J_{N-1}(u_{N-1},v_{N-1};x_{N-1}) = x_{N}^{T}Q_{ F}x_{N} + u_{N-1}^{T}R_{ u}u_{N-1} - v_{N-1}^{T}R_{ v}v_{N-1}& &{}\end{array}$$
(8.32)

The players’ sufficient statistics in this LG game are the expectation of the physical state and the covariance of the state’s estimation error: having run their respective Kalman filters during the time interval [1, N − 1], at decision time N − 1 the information available to player P is \((\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\) and the information of player E is \((\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\). In other words, at decision time N − 1 player P believes the physical state x N−1 to be \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\) whereas player E believes the physical state x N−1 to be specified as \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\). This is tantamount to stipulating that players P and E took separate measurements of the state x N−1. The quality of the players’ “instruments” used to take the measurements and also the degree of correlation of the players’ measurement errors is public knowledge—we refer to the measurement error covariances P N−1 (E), P N−1 (E) and \(\tilde{P}_{N-1}^{(E,P)}\). At the same time, the recorded measurements \(\overline{x}_{N-1}^{(P)}\) and \(\overline{x}_{N-1}^{(E)}\) are the private information of the respective players P and E: the “measurement” \(\overline{x}_{N-1}^{(E)}\) recorded by player E is his private information and is not shared with player P. Thus, player P has partial information. Similarly, the “measurement” \(\overline{x}_{N-1}^{(P)}\) recorded by player P is his private information and is not shared with player E, so also player E has partial information.

To gain a better appreciation of the informational issues in games with partial information, it is instructive to briefly digress and employ an “appealing” approach which is familiar to workers in deterministic control and which, unfortunately, is an approach sometimes employed in stochastic games. We now intentionally “take a wrong turn” which quickly leads us to a dead end. A correct analysis of the informational situation at hand follows.

Consider the following flawed argument: At time N − 1 the state information available to player P is \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\) and thus Player P calculates the expectation of his cost function

$$\displaystyle\begin{array}{rcl} \overline{J}_{N-1}^{(P)}(u_{ N-1},v_{N-1};\overline{x}_{N-1}^{(P)},P_{ N-1}^{(P)})& \equiv & E_{ x_{N-1}}\ (J(u_{N-1},v_{N-1};x_{N-1})\mid \overline{x}_{N-1}^{(P)},P_{ N-1}^{(P)}) \\ & =& (\overline{x}^{(P)})_{ N-1}^{T}A^{T}Q_{ F}A\overline{x}_{N-1}^{(P)} + Trace(A^{T}Q_{ F}AP_{N-1}^{(P)}) \\ & +& u_{N-1}^{T}(R_{ u} + B^{T}Q_{ F}B)u_{N-1} - v_{N-1}^{T}(R_{ v} - C^{T}Q_{ F}C)v_{N-1} \\ & +& 2u_{N-1}^{T}B^{T}Q_{ F}A\overline{x}_{N-1}^{(P)} + 2v_{ N-1}^{T}C^{T}Q_{ F}A\overline{x}_{N-1}^{(P)} \\ & +& 2u_{N-1}^{T}B^{T}Q_{ F}Cv_{N-1} + Trace(\varGamma ^{T}Q_{ F}\varGamma Q_{p}) {}\end{array}$$
(8.33)

At the same time the state information available to player E is \(x_{N-1} \sim \mathcal{N}(\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\) and Player E calculates the expectation of his payoff function

$$\displaystyle\begin{array}{rcl} \overline{J}_{N-1}^{(E)}(u_{ N-1},v_{N-1};\overline{x}_{N-1}^{(E)},P_{ N-1}^{(E)})& \equiv & E_{ x_{N-1}}\ (J(u_{N-1},v_{N-1};x_{N-1})\mid \overline{x}_{N-1}^{(E)},P_{ N-1}^{(E)}) \\ & =& (\overline{x}^{(E)})_{ N-1}^{T}A^{T}Q_{ F}A\overline{x}_{N-1}^{(E)} + Trace(A^{T}Q_{ F}AP_{N-1}^{(E)}) \\ & +& u_{N-1}^{T}(R_{ u} + B^{T}Q_{ F}B)u_{N-1} - v_{N-1}^{T}(R_{ v} - C^{T}Q_{ F}C)v_{N-1} \\ & +& 2u_{N-1}^{T}B^{T}Q_{ F}A\overline{x}_{N-1}^{(E)} + 2v_{ N-1}^{T}C^{T}Q_{ F}A\overline{x}_{N-1}^{(E)} \\ & +& 2u_{N-1}^{T}B^{T}Q_{ F}Cv_{N-1} + Trace(\varGamma ^{T}Q_{ F}\varGamma Q_{p}) {}\end{array}$$
(8.34)

Now Player P’s optimization, that is, the differentiation of his deterministic cost function (8.33), yields the relationship

$$\displaystyle\begin{array}{rcl} (R_{u} + B^{T}Q_{ F}B)u_{N-1} + B^{T}Q_{ F}Cv_{N-1} = -B^{T}Q_{ F}A\overline{x}_{N-1}^{(P)}& &{}\end{array}$$
(8.35)

and Player E’s optimization, that is, the differentiation of his deterministic payoff function (8.34), yields the relationship

$$\displaystyle\begin{array}{rcl} C^{T}Q_{ F}Bu_{N-1} - (R_{v} - C^{T}Q_{ F}C)v_{N-1} = -C^{T}Q_{ F}A\overline{x}_{N-1}^{(E)}& &{}\end{array}$$
(8.36)

Have obtained two equations in the players’ optimal controls, namely, the two unknowns u N−1 and v N−1 , which players P and E must separately solve in order to calculate their respective optimal controls. However player P cannot solve the set of two equations (8.35) and (8.36) because he does not know the “measurement” \(\overline{x}_{N-1}^{(E)}\) of E, and player E cannot solve this set of two equations because he does not know the “measurement” \(\overline{x}_{N-1}^{(P)}\) of P—both players have reached a dead end and it would appear that all that’s left to do is try to guess and outguess the opponent’s “measurement”. This state of affairs is caused by the players having partial information. This approach brings on the much maligned infinite regress in reciprocal reasoning! Unfortunately, this flawed approach is not foreign to the literature on dynamic stochastic games and it leads to erroneous “results”—see Aoki (1973) where, using this flawed argument, the LQGDG with a shared-control information pattern was “solved” and complicated “strategies” were computed.

We now change course and undertake a correct analysis of our LQGDG with a shared-control information pattern. To this end, it is imperative that one thinks in strategic terms. The strategies available to player P are mappings \(f: R^{n} \rightarrow R^{m_{u}}\) from his information set into his actions set; thus, the action of player P is \(u_{N-1} = f(\overline{x}_{N-1}^{(P)},P_{N-1}^{(P)})\). Similarly, the strategies available to player E are mappings \(g: R^{n} \rightarrow R^{m_{v}}\) from his information set into his actions set—thus, the action of player E is \(v_{N-1} = g(\overline{x}_{N-1}^{(E)},P_{N-1}^{(E)})\). However, we’ll show in the sequel that it suffices to consider P and E strategies of the form (8.9) and (8.10), respectively.

It is now important to realize that from player P’s vantage point, the action v N−1 of player E is a random variable. This is so because as far as player P is concerned the measurement \(\overline{x}_{N-1}^{(E)}\) of player E used in (8.10) to form his control v N−1 is a random variable. Similarly, from player E’s vantage point, the action u N−1 of player P is also a function of a random variable, \(\overline{x}_{N-1}^{(P)}\).

Consider the decision process of player P whose private information is \(\overline{x}_{N-1}^{(P)}\). He operates against the strategy g(⋅ ) of player E. Therefore, from player P’s perspective, the random variables at work are \(x_{N-1}\ and\ \overline{x}_{N-1}^{(E)}\). At decision time \(k = N - 1\) player P is confronted with a stochastic optimization problem and he calculates the expectation of the cost function (8.32), conditional on his private information \(\overline{x}_{N-1}^{(P)}\),

$$\displaystyle\begin{array}{rcl} \overline{J}^{(P)}(u_{ N-1},g(\cdot );\overline{x}_{N-1}^{(P)}) \equiv E_{ x_{N-1},\overline{x}_{N-1}^{(E)}}\ (J(u_{N-1},g(\overline{x}_{N-1}^{(E)});x_{ N-1})\mid \overline{x}_{N-1}^{(P)})& &{}\end{array}$$
(8.37)

By correctly using in the calculation of his expected cost (8.37) player’s E \(strategy\ function\ g(\overline{x}_{N-1}^{(E)})\) rather than, as before, player E’s control v N−1, player P has eliminated the possibility of an infinite regress in reciprocal reasoning. This is so because P now has all the information to be able, in principle, to calculate the said expectation. Thus, player P calculates his expected cost

$$\displaystyle\begin{array}{rcl} \overline{J}^{(P)}(u_{ N-1},g(\cdot );\overline{x}_{N-1}^{(P)})& =& (\overline{x}_{ N-1}^{(P)})^{T}A^{T}Q_{ F}A\overline{x}_{N-1}^{(P)} + Trace(A^{T}Q_{ F}AP_{N-1}^{(P)}) \\ & +& u_{N-1}^{T}(R_{ u} + B^{T}Q_{ F}B)u_{N-1} + 2u_{N-1}^{T}B^{T}Q_{ F}A\overline{x}_{N-1}^{(P)} \\ & +& 2E_{x_{N-1},\overline{x}_{N-1}^{(E)}}\ (g^{T}(\overline{x}_{ N-1}^{(E)})C^{T}Q_{ F}Ax_{N-1}\mid \overline{x}_{N-1}^{(P)}) \\ & -& E_{\overline{x}_{N-1}^{(E)}}\ (g^{T}(\overline{x}_{ N-1}^{(E)})(R_{ v} - C^{T}Q_{ F}C)g(\overline{x}_{N-1}^{(E)})\mid \overline{x}_{ N-1}^{(P)}) \\ & +& 2u_{N-1}^{T}B^{T}Q_{ F}CE_{\overline{x}_{N-1}^{(E)}}\ (g(\overline{x}_{N-1}^{(E)})\mid \overline{x}_{ N-1}^{(P)}) \\ & +& Trace(\varGamma ^{T}Q_{ F}\varGamma Q_{p}) {}\end{array}$$
(8.38)

Player P calculates the expectations with respect to the random variable \(\overline{x}_{N-1}^{(E)}\) which features in Eq. (8.38), cognizant that it is \(\overline{x}_{N-1}^{(E)} \sim \mathcal{N}(\overline{x}_{N-1}^{(P)},P_{N-1}^{(E,P)})\). In this game with partial information, player P is using his measurement/private information \(\overline{x}_{N-1}^{(P)}\) and the public information to estimate the sufficient statistic \(\overline{x}_{N-1}^{(E)}\) of player E, which is the argument of E’s strategy function g(⋅ ); the latter features in player’s P cost functional (8.38) and thus enters the calculation of P’s cost.

The careful analysis of the optimization problem at hand leads to a Fredholm equation of the second kind of the convolution type with a kernel which is a Gaussian function; the unknown functions are the players’ optimal strategies. Taking the point of view of player E yields a similar Fredholm integral equation in the players’ optimal strategies. The solution of the set of two Fredholm equations yields the optimal strategies of players P and E. The optimal strategies turn out to be linear after all! The reader is referred to reference Pachter (2013) for the complete derivation.

8.5 Dynamic Programming

We consider the LQGDG (8.1)–(8.6) with a control-sharing information pattern as in Aoki (1973). The planning horizon N ≥ 2.

8.5.1 Sufficient Statistics

The initial state information and the measurements of players P and E are their private information but their past controls are shared information. Even though the players have partial information because the initial state information and their measurements are not shared, from the point of view of both players P and E, the control system is nevertheless Linear Gaussian (LG). This is so because at decision time k their respective adversary’s information state components v 0, , v k−1 and u 0, , u k−1 are not random variables with unknown p.d.f.s but are known to the players: The LQGDG with a control-sharing information pattern is LG and therefore the conditions for the P-player’s information state to be Gaussian hold and at decision time k the sufficient statistics of P and E are \(\overline{x}_{k}^{(P)}\) and \(\overline{x}_{k}^{(E)}\), respectively. Furthermore, as far as player P is concerned, at time k the sufficient statistic \(\overline{x}_{k}^{(E)}\) of player E is the random variable \(\overline{x}_{k}^{(E)} \sim \mathcal{N}(\overline{x}_{k}^{(P)},P_{k}^{(E,P)})\) and he uses this information in the calculation of his cost-to-go/value function at time k. Similarly, player E considers the sufficient statistic \(\overline{x}_{k}^{(P)}\) of player P to be \(\overline{x}_{k}^{(P)} \sim \mathcal{N}(\overline{x}_{k}^{(E)},P_{k}^{(E,P)})\) and player E uses this information in the calculation of his cost-to-go/value function at time k.

8.5.2 Analysis

The analysis is along the lines of the analysis of the static LQG game with partial information (Pachter 2013) and the analysis of the end game in Sect. 8.4 where \(k = N - 1\). We shall require

Proposition 1.

The value functions of players P and E are quadratic in their respective sufficient statistics \(\overline{x}_{k}^{(P)}\) and \(\overline{x}_{k}^{(E)}\) , that is

$$\displaystyle\begin{array}{rcl} V _{k}^{P)}(\overline{x}_{ k}^{(P)},P_{ k}^{(P)};P_{ k}^{(E)},\tilde{P}_{ k}^{(E,P)})& =& (\overline{x}_{ k}^{(P)})^{T}\varPi _{ k}\overline{x}_{k}^{(P)} + c_{ k}^{(P)}(P_{ k}^{(P)};P_{ k}^{(E)},\tilde{P}_{ k}^{(E,P)}),\ \ k = 0,1,\ldots,N - 1, {}\\ V _{k}^{E)}(\overline{x}_{ k}^{(E)},P_{ k}^{(E)};P_{ k}^{(P)},\tilde{P}_{ k}^{(E,P)})& =& (\overline{x}_{ k}^{(E)})^{T}\varPi _{ k}\overline{x}_{k}^{(E)} + c_{ k}^{(E)}(P_{ k}^{(E)};P_{ k}^{(P)},\tilde{P}_{ k}^{(E,P)}),\ \ k = 0,1,\ldots,N - 1 {}\\ \end{array}$$

where

$$\displaystyle\begin{array}{rcl} \varPi _{k}\ are\ n \times n\ real\ symmetric\ matrices\ and\ the\ scalars\ c_{k}^{(P)},c_{ k}^{(E)} \in R^{1},\ k = 0,\ldots,N.\ \ \ \ & & {}\\ \end{array}$$

Similar to the correct approach outlined in Sect. 8.4 we calculate the value functions by taking the expectations over the relevant random variables.

$$\displaystyle\begin{array}{rcl} V _{k}^{(P)}(\overline{x}_{ k}^{(P)},P_{ k}^{(P)};P_{ k}^{(E)},\tilde{P}_{ k}^{(E,P)})& =& min_{ u_{k}}\{u_{k}^{T}[R_{ u} + B^{T}(Q +\varPi _{ k+1})B]u_{k} \\ & & +2u_{k}^{T}B^{T}(Q +\varPi _{ k+1})(A\overline{x}_{k}^{(P)} \\ & +& CE_{\tilde{w}}\ (\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} +\tilde{ w})))\} + (\overline{x}_{ k}^{(P)})^{T}A^{T}(Q +\varPi _{ k+1})A\overline{x}_{k}^{(P)} \\ & -& E_{\tilde{w}}\ ((\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} +\tilde{ w}))^{T} \\ & & [R_{v} - C^{T}(Q +\varPi _{ k+1})C]\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} +\tilde{ w})) \\ & +& 2E_{\tilde{w}}\ ((\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} +\tilde{ w}))^{T})C^{T}(Q +\varPi _{ k+1})A\overline{x}_{k}^{(P)} \\ & -& 2E_{e_{k}^{(E)},e_{k}^{(P)}}\ ((\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} + e_{ k}^{(E)} - e_{ k}^{(P)}))^{T}C^{T}(Q \\ & +& \varPi _{k+1}K_{k+1}^{(P)}H^{(P)})Ae_{ k}^{(P)}) + Trace(A^{T}QAP_{ k}^{(P)}) \\ & +& Trace(\varGamma ^{T}Q\varGamma Q_{ p}) + Trace((K_{k+1}^{(P)})^{T}\varPi _{ k+1}K_{k+1}^{(P)}R_{ m}^{(P)}) \\ & +& Trace(\varGamma ^{T}(H^{(P)})^{T}(K_{ k+1}^{(P)})^{T}\varPi _{ k+1}K_{k+1}^{(P)}H^{(P)}\varGamma Q_{ p}) \\ & +& Trace(A^{T}(H^{(P)})^{T}(K_{ k+1}^{(P)})^{T}\varPi _{ k+1}K_{k+1}^{(P)}H^{(P)}AP_{ k}^{(P)}) \\ & +& c_{k+1}^{(P)}(P_{ k+1}^{(P)};P_{ k+1}^{(E)},\tilde{P}_{ k+1}^{(E,P)}), {}\end{array}$$
(8.39)
$$\displaystyle\begin{array}{rcl} V _{k}^{(E)}(\overline{x}_{ k}^{(E)},P_{ k}^{(E)};P_{ k}^{(P)},\tilde{P}_{ k}^{(E,P)})& =& max_{ v_{k}}\{ - v_{k}^{T}[R_{ v} - C^{T}(Q +\varPi _{ k+1})C]v_{k} \\ & & +2v_{k}^{T}C^{T}(Q +\varPi _{ k+1})(A\overline{x}_{k}^{(E)} \\ & +& BE_{\tilde{w}}\ (\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} -\tilde{ w})))\} + (\overline{x}_{ k}^{(E)})^{T}A^{T}(Q +\varPi _{ k+1})A\overline{x}_{k}^{(E)} \\ & +& E_{\tilde{w}}\ ((\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} -\tilde{ w}))^{T} \\ & & [R_{u} + B^{T}(Q +\varPi _{ k+1})B]\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} -\tilde{ w})) \\ & +& 2E_{\tilde{w}}\ ((\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} -\tilde{ w}))^{T})B^{T}(Q +\varPi _{ k+1})A\overline{x}_{k}^{(E)} \\ & -& 2E_{e_{k}^{(E)},e_{k}^{(P)}}\ ((\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} - e_{ k}^{(E)} + e_{ k}^{(P)}))^{T}B^{T}(Q \\ & +& \varPi _{k+1}K_{k+1}^{(E)}H^{(E)})Ae_{ k}^{(E)}) + Trace(A^{T}QAP_{ k}^{(E)}) \\ & +& Trace(\varGamma ^{T}Q\varGamma Q_{ p}) + Trace((K_{k+1}^{(E)})^{T}\varPi _{ k+1}K_{k+1}^{(E)}R_{ m}^{(E)}) \\ & +& Trace(\varGamma ^{T}(H^{(E)})^{T}(K_{ k+1}^{(E)})^{T}P_{ k+1}K_{k+1}^{(E)}H^{(E)}\varGamma Q_{ p}) \\ & +& Trace(A^{T}(H^{(E)})^{T}(K_{ k+1}^{(E)})^{T}\varPi _{ k+1}K_{k+1}^{(E)}H^{(E)}AP_{ k}^{(E)}) \\ & +& c_{k+1}^{(E)}(P_{ k+1}^{(P)};P_{ k+1}^{(E)},\tilde{P}_{ k+1}^{(E,P)}) {}\end{array}$$
(8.40)

where the random variable \(\tilde{w} \equiv e_{k}^{(P)} - e_{k}^{(E)} \sim \mathcal{N}(0,P_{k}^{(E,P)})\).

8.5.3 Optimization

Consider the minimization problem faced by P at decision time 0 ≤ k ≤ N − 2: Differentiating the RHS of Eq. (8.39) in his control u k he obtains the optimality condition

$$\displaystyle\begin{array}{rcl} u_{k}^{{\ast}}& =& -[R_{ u} + B^{T}(Q +\varPi _{ k+1})B]^{-1}B^{T}(Q +\varPi _{ k+1})(A\overline{x}_{k}^{(P)} + CE_{\tilde{ w}}(\ (\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} +\tilde{ w})\ )), \\ k& =& 0,1,\ldots,N - 1 {}\end{array}$$
(8.41)

and similarly, upon differentiating the RHS of Eq. (8.40) in v k player E obtains

$$\displaystyle\begin{array}{rcl} v_{k}^{{\ast}}& =& [R_{ v} - C^{T}(Q +\varPi _{ k+1})C]^{-1}C^{T}(Q +\varPi _{ k+1})(A\overline{x}_{k}^{(E)} + BE_{\tilde{ w}}(\ (\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} -\tilde{ w})\ )), \\ k& =& 0,1,\ldots,N - 1 {}\end{array}$$
(8.42)

Player P has obtained an expression for his optimal control u k where \(\overline{x}_{k}^{(E)}\) does not feature and u k is a function of the parameter \(\overline{x}_{k}^{(P)}\) only. However, the strategy function γ k (E)(⋅ ) of player E features in this equation. Indeed, the strategic relationship holds

$$\displaystyle\begin{array}{rcl} (\gamma _{k}^{(P)}(\overline{x}_{ k}^{(P)}))^{{\ast}}& =& -[R_{ u} + B^{T}(Q +\varPi _{ k+1})B]^{-1}B^{T}(Q +\varPi _{ k+1})(A\overline{x}_{k}^{(P)} \\ & +& CE_{\tilde{w}}(\ ((\gamma _{k}^{(E)}(\overline{x}_{ k}^{(P)} +\tilde{ w}))^{{\ast}}\ )),\ \ \ k = 0,1,\ldots,N - 1{}\end{array}$$
(8.43)

We have obtained an expression for P’s optimal strategy function \((\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)}))^{{\ast}}\) in terms of the strategy γ k (E)(⋅ ) of player E. Payer P obtained a linear relationship which directly ties together the as yet unknown optimal strategies \((\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)}))^{{\ast}}\) and \((\gamma _{k}^{(E)}(\overline{x}_{k}^{(E)}))^{{\ast}}\) of players P and E. Similarly, also player E obtains a linear relationship among the players’ optimal strategies:

$$\displaystyle\begin{array}{rcl} (\gamma _{k}^{(E)}(\overline{x}_{ k}^{(E)}))^{{\ast}}& =& [R_{ v} - C^{T}(Q +\varPi _{ k+1})C]^{-1}C^{T}(Q +\varPi _{ k+1})(A\overline{x}_{k}^{(E)} \\ & +& BE_{\tilde{w}}(\ ((\gamma _{k}^{(P)}(\overline{x}_{ k}^{(E)} -\tilde{ w}))^{{\ast}}\ )),\ \ \ k = 0,1,\ldots,N - 1{}\end{array}$$
(8.44)

Equations (8.43) and (8.44) constitute a linear system of Fredholm integral equations of the second kind in the players’ optimal strategies \((\gamma _{k}^{(P)}(\overline{x}_{k}^{(P)}))^{{\ast}}\) and \((\gamma _{k}^{(E)}(\overline{x}_{k}^{(E)}))^{{\ast}}\). Similar to the analysis in reference Pachter (2013), the solution of the linear system of Fredholm integral equations of the second kind, Eqs. (8.43) and (8.44), yields the optimal strategies which are linear in the players’ sufficient statistics, namely

$$\displaystyle\begin{array}{rcl} \gamma _{k}^{(P)}(\overline{x}_{ k}^{(P)}) = F_{ k}^{(P)} \cdot \overline{x}_{ k}^{(P)},\ \ \ \gamma _{ k}^{(E)}(\overline{x}_{ k}^{(E)}) = F_{ k}^{(E)} \cdot \overline{x}_{ k}^{(E)}& & {}\\ \end{array}$$

and the formulae for the optimal gains

$$\displaystyle\begin{array}{rcl} (F_{k}^{(P)})^{{\ast}}& =& -S_{ B}^{-1}(Q +\varPi _{ k+1})B^{T}(Q +\varPi _{ k+1})\{I + C[R_{v} - C^{T}(Q +\varPi _{ k+1})C]^{-1}C^{T}(Q \\ & +& \varPi _{k+1})\}A,\ \ \ \ \ \ \ \ \ \ k = 0,\ldots,N - 1 {}\end{array}$$
(8.45)
$$\displaystyle\begin{array}{rcl} (F_{k}^{(E)})^{{\ast}}& =& -S_{ C}^{-1}(Q +\varPi _{ k+1})C^{T}(Q +\varPi _{ k+1})\{I - B[R_{u} + B^{T}(Q +\varPi _{ k+1})B]^{-1}B^{T}(Q \\ & +& \varPi _{k+1})\}A,\ \ \ \ \ \ \ \ \ \ k = 0,\ldots,N - 1 {}\end{array}$$
(8.46)

The control system is Linear - Gaussian (LG) and therefore the players’ information states are Gaussian, time consistency in this dynamic game is provided by the application of the method of Dynamic Programming (DP) where the DP state is the information state, and, by construction, the strategies are Person-By-Person-Satisfactory (PBPS), so in the LQGDG with a control-sharing information pattern, a Nash equilibrium is obtained—as was also the case in the static LQG game with partial information (Pachter 2013).

8.5.4 Value Functions

The parameters which specify the statistics of the random variables in the LQGDG do not feature in the formulae (8.45) and (8.46) for the players’ optimal strategies and consequently an inspection of the DP equations (8.39) and (8.40) tells us that the matrices Π k won’t be a function of the said parameters; in other words, the matrices Π k are exclusively determined by the deterministic plant’s parameters A, B, C, Q, Q F , R c (P) and R c (E). Hence Π k  = P k , where P k is the solution of the Riccati equation (8.57) derived for the deterministic LQDG discussed in the Appendix. Upon defining \(P_{k}:= P_{k} + Q\), the optimal gains correspond to the optimal gains in the deterministic LQDG, Eqs. (8.61) and (8.62) in the Appendix and the players’ optimal gains are

$$\displaystyle\begin{array}{rcl} (F_{k}^{(P)})^{{\ast}}& =& -S_{ B}^{-1}(P_{ k+1} + Q)B^{T}(P_{ k+1} + Q)\{I + C[R_{v} - C^{T}(P_{ k+1} + Q)C]^{-1}C^{T}(P_{ k+1} \\ +Q)\}A& & {}\end{array}$$
(8.47)
$$\displaystyle\begin{array}{rcl} (F_{k}^{(E)})^{{\ast}}& =& -S_{ C}^{-1}(P_{ k+1} + Q)C^{T}(P_{ k+1} + Q)\{I - B[R_{u} + B^{T}(P_{ k+1} + Q)B]^{-1}B^{T}(P_{ k+1} \\ +Q)\}A& & {}\end{array}$$
(8.48)

The recursions for the scalars c k (P) and c k (E) are obtained from the respective DP equations (8.39) and (8.40):

$$\displaystyle\begin{array}{rcl} c_{k}^{(P)}& =& c_{ k+1}^{(P)} \\ & +& 2\ Trace((F_{k}^{(P)}))^{{\ast}T}C^{T}(Q + P_{ k+1}K_{k+1}^{(P)}H^{(P)})A(P_{ k}^{(P)} -\tilde{ P}_{ k}^{(P,E)}) \\ & +& Trace(A^{T}QAP_{ k}^{(P)}) + Trace(\varGamma ^{T}Q\varGamma Q_{ p}) \\ & +& Trace(\varGamma ^{T}(H^{(P)})^{T}(K_{ k+1}^{(P)})^{T}P_{ k+1}K_{k+1}^{(P)}H^{(P)}\varGamma Q_{ p}) \\ & +& Trace((K_{k+1}^{(P)})^{T}P_{ k+1}K_{k+1}^{(P)}R_{ m}^{(P)}) \\ & +& Trace(A^{T}(H^{(P)})^{T}(K_{ k+1}^{(P)})^{T}P_{ k+1}K_{k+1}^{(P)}H^{(P)}AP_{ k}^{(P)}),\ k = N - 2,\ldots,0{}\end{array}$$
(8.49)

and for \(k = N - 1\) we use the end-game equation

$$\displaystyle\begin{array}{rcl} c_{N-1}^{(P)}(P_{ N-1}^{(P)};P_{ N-1}^{(E)},\tilde{P}_{ N-1}^{(P,E)})& =& Trace(\ A^{T}Q_{ F}AP_{N-1}^{(P)} + 2(P_{ N-1}^{(P)} \\ & -& \tilde{P}_{N-1}^{(P,E)}P_{ N-1}^{(P)})A^{T}Q_{ F}C(F_{N-1}^{(E)})^{{\ast}}- ((F_{ N-1}^{(E)})^{{\ast}})^{T}(R_{ v} \\ & -& C^{T}Q_{ F}C)(F_{N-1}^{(E)})^{{\ast}}P_{ N-1}^{(E,P)} +\varGamma ^{T}Q_{ F}\varGamma Q_{p}\ ) {}\end{array}$$
(8.50)

Similarly,

$$\displaystyle\begin{array}{rcl} c_{k}^{(E)}& =& c_{ k+1}^{(E)} \\ & +& 2\ Trace((F_{k}^{(E)})^{{\ast}})^{T}B^{T}(Q + P_{ k+1}K_{k+1}^{(E)}H^{(E)})A(P_{ k}^{(E)} -\tilde{ P}_{ k}^{(E,P)}) \\ & +& Trace(A^{T}QAP_{ k}^{(E)}) + Trace(\varGamma ^{T}Q\varGamma Q_{ p}) \\ & +& Trace(\varGamma ^{T}(H^{(E)})^{T}(K_{ k+1}^{(E)})^{T}P_{ k+1}K_{k+1}^{(E)}H^{(E)}\varGamma Q_{ p}) \\ & +& Trace((K_{k+1}^{(E)})^{T}P_{ k+1}K_{k+1}^{(E)}R_{ m}^{(E)}) \\ & +& Trace(A^{T}(H^{(E)})^{T}(K_{ k+1}^{(E)})^{T}P_{ k+1}K_{k+1}^{(E)}H^{(E)}AP_{ k}^{(E)}),\ k = N - 2,\ldots,0{}\end{array}$$
(8.51)

and for \(k = N - 1\) we use the end-game equation

$$\displaystyle\begin{array}{rcl} c_{N-1}^{(E)}(P_{ N-1}^{(E)};P_{ N-1}^{(P)},\tilde{P}_{ N-1}^{(E,P)})& =& Trace(\ A^{T}Q_{ F}AP_{N-1}^{(E)} + ((F_{ N-1}^{(P)})^{{\ast}})^{T}(R_{ u} \\ & +& B^{T}Q_{ F}B)(F_{N-1}^{(P)})^{{\ast}}P_{ N-1}^{(E,P)} + 2((F_{ N-1}^{(P)})^{{\ast}})^{T}B^{T}Q_{ F}A(P_{N-1}^{(E)} \\ & -& \tilde{P}_{N-1}^{(E,P)} +\varGamma ^{T}Q_{ F}\varGamma Q_{p})\ ) {}\end{array}$$
(8.52)

Remark.

Only the parameters A, B, C, Q, Q F , R c (P) and R c (E) feature in the Riccati equation for P k , as if the game would be the deterministic LQDG. The players’ measurement matrices, the process noise parameters and the measurement noise covariances do not feature in Eq. (8.57). However the solution P k of the Riccati equation (8.57) and the LQGDG’s measurements—related parameters H (P), H (E), the process noise parameters, R m (P) and R m (E), and the Kalman gains, all enter the recursions for the “intercepts” c (P) and c (E).

8.6 Main Result

The analysis of the LQGDG with a control-sharing information pattern is summarized in the following

Theorem 1.

Consider the LQGDG (8.1)–(8.6) with the information pattern:

  1. 1.

    The P and E players’ prior information is given in Eqs.  (8.2) and (8.4) , respectively. The prior information \(\overline{x}_{0}^{(P)}\) and \(\overline{x}_{0}^{(E)}\) is private information of the respective P and E players and it is not shared among the P and E players. The covariances P 0 (P) and P 0 (E) are finite and are public information.

  2. 2.

    At decision time 1 ≤ k ≤ N − 1 the measurements of player P and player E are z k (P) and z k (E) and their measurement equations are Eqs.  (8.3) and (8.5) , respectively. At decision time 1 ≤ k ≤ N − 1 the respective measurement records \(Z_{k}^{(P)} =\{ z_{1}^{(P)},\ldots,z_{k}^{(P)}\}\) and \(Z_{k}^{(E)} =\{ z_{1}^{(E)},\ldots,z_{k}^{(E)}\}\) are the private information of players P and E and the measurements are not shared among the P and E players.

  3. 3.

    At decision time \(k = 1,\ldots,N - 1\) the P and E players have complete recall of their respective ownship control histories \(U_{k} =\{ u_{0},\ldots,u_{k-1}\}\) and \(V _{k} =\{ v_{0},\ldots,v_{k-1}\}\) .

  4. 4.

    The players observe their opponent’s moves: at decision time 1 ≤ k ≤ N − 1 the control history \(V _{k} =\{ v_{0},\ldots,v_{k-1}\}\) of player E is known to player P and, similarly, player E knows the control history \(U_{k} =\{ u_{0},\ldots,u_{k-1}\}\) of player P.

The players obtain their private state estimates \(\overline{x}_{k}^{(P)}\) and \(\overline{x}_{k}^{(E)}\) by running two separate Kalman Filters (KFs) in parallel driven by their private prior information and their separate measurements: Player P initialized his KF (8.11)–(8.15) with his prior information \((\overline{x}_{0}^{(P)},P_{0}^{(P)})\) and uses his measurements z k (P). Similarly, player E initialized his KF (8.16)–(8.20) with his prior information \((\overline{x}_{0}^{(E)},P_{0}^{(E)})\) and uses his measurements z k (E). Both players use the shared complete input history.

The players reuse the state feedback optimal strategies derived for the deterministic LQDG as provided by Theorem A1: In Eq. (8.61) player P sets \(x_{k}:= \overline{x}_{k}^{(P)}\) and in Eq. (8.62) player E sets \(x_{k}:= \overline{x}_{k}^{(E)}\).

A Nash equilibrium for the “zero-sum” LQGDG with a control-sharing information pattern is established. The value functions of players P and E are

$$\displaystyle\begin{array}{rcl} V _{k}^{(P)}(\overline{x}_{ k}^{(P)},P_{ k}^{(P)};P_{ k}^{(E)},\tilde{P}_{ k}^{(E,P)})& =& (\overline{x}_{ k}^{(P)})^{T}P_{ k}\overline{x}_{k}^{(P)} + c_{ k}^{(P)} {}\\ V _{k}^{(E)}(\overline{x}_{ k}^{(E)},P_{ k}^{(E)};P_{ k}^{(P)},\tilde{P}_{ k}^{(E,P)})& =& (\overline{x}_{ k}^{(E)})^{T}P_{ k}\overline{x}_{k}^{(E)} + c_{ k}^{(E)} {}\\ \end{array}$$

where the matrices P k are the solution of the Riccati equation (8.57). The “intercepts” c k (P) and c k (E) are obtained by solving the respective scalar recursions [(8.49), (8.50), (8.48)] and [(8.51), (8.52), (8.47)]. The covariance matrices P k (P), P k (E) and \(\tilde{P}_{k}^{(E,P)}\) exclusively feature in the intercepts’ recursions. The matrices \(\tilde{P}_{k}^{(E,P)}\) are given by the solution of the Lyapunov-like linear matrix equation (8.29). The control Riccati equation (8.57), the KF Riccati equation (8.12), (8.13), (8.15) of player P, the KF Riccati equation (8.17), (8.18), (8.20) of player E, and the Lyapunov-like linear matrix equation (8.29) can all be solved ahead of time and off line. Once the three Riccati equations and the Lyapunov equation have been solved, the value functions’ “intercepts” c k (P) and c k (E) are also obtained off line. □ 

8.7 Conclusion

Linear-Quadratic Gaussian Dynamic Games with a control-sharing information pattern have been considered. The players’ initial state information and their measurements are private information, but each player is able to observe his antagonist’s past inputs: the protagonists’ past controls is shared information. Although this is a game with partial information, the control-sharing information pattern renders the game amenable to solution by the method of DP and a Nash equilibrium for the “zero-sum” LQGDG is established. The attendant optimal strategies of the LQGDG with a control-sharing information pattern are linear and certainty equivalence holds. The linearity of the optimal strategies has not been artificially imposed from the outset but follows from the LQG nature of the optimization problem at hand, courtesy of the control-sharing information pattern. The correct solution of LQGDGs with a control-sharing information pattern is obtained in closed-form.