A Risk-Averse Game-Theoretic Approach to Distributed Control

Pham, Khanh D.; Pachter, Meir

doi:10.1007/978-1-4614-7582-8_4

Khanh D. Pham³ &
Meir Pachter⁴

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 51))

995 Accesses
2 Citations

Abstract

The research article gives a comprehensive presentation of the broad and still developing area of risk-averse decision-making approach to control of distributed stochastic systems. A distributed stochastic system considered here consists of the interconnection of two or more stochastic systems with the structural constraints of linear system dynamics, quadratic cost functionals, and additive stationary Wiener noises corrupting the system dynamics and measurements. Each system has an input from its incumbent agent or controller and an output to its local environment, in addition to links with the other neighboring systems. The problem of distributed control without communications between incumbent agents or controllers is formulated as a nonzero-sum stochastic differential game. Local best responses by each incumbent agents with risk-averse attitudes toward performance uncertainty are determined by a person-by-person equilibrium and subject to decentralized output-feedback information structures.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Framework for Coordination in Distributed Stochastic Systems: Perfect State Feedback and Performance Risk Aversion

Distributed Supervisory Strategies for Multi-agent Networked Systems

Robust Optimal Control of Dynamically Decoupled Systems via Distributed Feedbacks

Keywords

1 Introduction

The purpose of this research investigation is to introduce to the readers the problem of control of distributed stochastic systems, to propose risk-averse decision making towards performance uncertainty, and to indicate emergent approaches for future research and development. The importance of broad flexibility and adaptability of the decision and control architectures of distributed control has spurred many large-scale applications such as military command and control hierarchies, spacecraft constellations, remotely piloted platform formations, and teams of humans and autonomous robots. where each member can be in best response to its neighbor actions and yet has no influence on other members to which it has no communication supports.

Despite the broad interest in distributed systems, there remain significant hurdles in applying them to practical problems of interest. Interplay between coalition objectives and individual member objectives can yield surprises and complex behaviors. Thus motivated, the main problem of the research herein is control of distributed systems via the game-theoretic framework with performance risk aversion. To the best knowledge of the authors, most studies, for instance, [1, 2] have mainly concentrated on the selection of open and/or closed-loop Nash strategy equilibria in accordance of expected utilities under the structural constraints of linear system dynamics, quadratic cost functionals, and additive independent white Gaussian noises corrupting the system dynamics and measurements. Very little work, if any, has been published on the subject of higher-order assessment of performance uncertainty and risks beyond expected performance.

For this reason attention in this research investigation is directed primarily towards a linear-quadratic class of nonzero-sum differential games which has linear system dynamics, quadratic cost functionals, and independent white zero-mean Gaussian noises additively corrupting the system dynamics and output measurements. Notice that, under these conditions, the quadratic cost functionals or outcomes associated with the game are random variables with the generalized chi-squared probability distributions. If a measure of uncertainty such as the variance of the possible outcome was used in addition to the expected outcome, the incumbent agents or controllers should be able to correctly order preferences for alternatives. This claim seems plausible, but it is not always correct. Various investigations have indicated that any evaluation scheme based on just the expected outcome and outcome variance would necessarily imply indifference between some courses of action; therefore, no criterion based solely on the two attributes of means and variances can correctly represent their preferences. See the works [3, 4] for more details.

Recent accounts by the first author [5, 6] have addressed risk aversion for performance uncertainty of cooperative and noncooperative large-scale stochastic systems, wherein the shape and functional form of an utility function tell a great deal about the basic attitudes of the agents or controllers toward the uncertain outcomes or performance risks. In particular, the new utility function or the so-called the generalized performance index, which is proposed therein as a linear manifold defined by a finite number of semi-invariants associated with a random quadratic cost functional, will provide a convenient allocation representation of apportioning performance robustness and reliability requirements into the multi-attribute requirement of qualitative characteristics of expected performance and performance risks.

The present research contributions are to extend the existing results in [7] toward some completely unexplored areas as such: (1) the design of decentralized filtering via constrained filters for self-directed agent subject to the linear-quadratic class of nonzero-sum stochastic differential games; (2) an efficient and tractable procedure that calculates exactly all the mathematical statistics associated with the generalized chi-squared performance measure for each self-directed agent; and (3) the risk-averse control and decision synthesis that is mostly via a person-by-person equilibrium for reliable performance.

Given the aforementioned background, the article is organized as follows. Section 2 contains the problem description in which basic assumptions related to the state-space model associated with each incumbent decision makers or controllers residing at distributed systems are discussed. In addition, the development of mathematical statistics for performance robustness whose the backward-in-time differential equations are characterized by making use of both compactness from the logics of the state-space representation and the quantitativity from a-priori knowledge of the underpinning probabilistic processes is further presented in detail. Subsequently, Sect. 3 provides the complete problem statements of statistical optimal decision making via the person-by-person equilibrium framework, unique notations, terminologies, definitions as well as the necessary and sufficient conditions for the existence of person-by-person equilibrium strategies. With regards to the theoretical constructs and design principles for distributed stochastic systems to include the requirements of performance reliability, decision making with risk consequences and emerging effects within the stochastic environment, the understanding of performance variations, risk-averse attitudes and the course correction required for realistic situations is determined and obtained in Sect. 4. Finally, conclusions pertaining to decisions with risk consequences and output feedback design for the linear-quadratic class of distributed stochastic linear systems with quadratic performance appraisals are presented in Sect. 5.

2 Mathematical Statistics for Performance Robustness

Before going into a formal presentation, it is necessary to consider some conceptual notations in this article. For instance, time t is modeled as continuous and the notation of the time interval is $[t_{0},t_{f}]$. All random variables are defined on a probability space $(\Omega,\mathcal{F},\mathcal{P})$ which is a triple consisting of a set Ω, a σ-algebra $\mathcal{F}$, and a probability measure $\mathcal{P}: \mathcal{F}\mapsto [0,1]$ and is equipped with a filtration $\{\mathcal{F}_{t}: t \in [t_{0},t_{f}]\}$. In addition, for a given Hilbert space X with norm $\vert \vert \cdot \vert \vert _{X}$, 1 ≤ p ≤ ∞, a Banach space is defined as follows

$$\displaystyle\begin{array}{rcl} \mathcal{L}_{\mathcal{F}}^{p}(t_{ 0},t_{f};X)& & \triangleq \left \{\phi: [t_{0},t_{f}] \times \Omega \mapsto X\,\,\mbox{ is an $X$-valued $\mathcal{F}_{t}$-measurable process}\right. \\ & & \left.\quad \mbox{ with}\,\,E\left \{\int _{t_{0}}^{t_{f} }\vert \vert \phi (t,\omega )\vert \vert _{X}^{p}dt\right \} < \infty \right \} {}\end{array}$$

(1)

with norm

$$\displaystyle{ \vert \vert \phi (\cdot )\vert \vert _{\mathcal{F},p} \triangleq {\left (E\left \{\int _{t_{0}}^{t_{f} }\vert \vert \phi (t,\omega )\vert \vert _{X}^{p}\,dt\right \}\right )}^{1/p}\,. }$$

(2)

Furthermore, the Banach space of X-valued continuous functionals on $[t_{0},t_{f}]$ with the max-norm induced by $\vert \vert \cdot \vert \vert _{X}$ is denoted by $\mathcal{C}(t_{0},t_{f};X)$. The deterministic version of (1) and its associated norm (2) is written as ${\mathcal{L}}^{p}(t_{0},t_{f};X)$ and $\vert \vert \cdot \vert \vert _{p}$.

A distributed stochastic system that evolves over $[t_{0},t_{f}]$ captures interactions among a finite number of incumbent systems. Each incumbent system that enters the distributed system is assigned a unique positive integer-valued index. The set of indices of incumbent systems is denoted by $\mathcal{I}\triangleq \{ 1,2,\ldots,N\}$ and a typical element by i. The set of immediate neighbors that have communication paths with an incumbent system i is denoted by $\mathcal{N}_{i}$, whereby the cardinality of $\mathcal{N}_{i}$ is notated as N _i. For concreteness, the heterogeneity of incumbent system i and $i \in \mathcal{I}$ is distinguished by an individual state that is governed by the stochastic differential equation with the known initial condition $x_{i}(t_{0}) = x_{i}^{0}$ and $t \in [t_{0},t_{f}]$

$$\displaystyle{ dx_{i}(t) =\Bigg (A_{ii}(t)x_{i}(t) + B_{ii}(t)u_{i}(t) +\sum _{ j=1}^{N_{i} }B_{ij}(t)u_{ij}(t)\Bigg)dt + G_{ii}(t)dw_{i}(t)\,, }$$

(3)

where the continuous-time coefficients $A_{ii} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{n_{i}\times n_{i}})$, $B_{ii} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{n_{i}\times m_{i}})$, $B_{ij} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{n_{i}\times r_{i}})$ and $G_{ii} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{n_{i}\times p_{i}})$ are deterministic matrix-valued functions. At time t, the recursive state of incumbent system i is denoted by $x_{i} \in \mathcal{L}_{\mathcal{F}_{i}}^{2}(t_{0},t_{f}; {\mathbb{R}}^{n_{i}})$ with the initial state $x_{i}^{0} \in {\mathbb{R}}^{n_{i}}$ known. The control policy from agent i to that system i is presented by $u_{i} \in \mathcal{L}_{\mathcal{F}_{i}}^{2}(t_{0},t_{f}; {\mathbb{R}}^{m_{i}})$.

In addition, the interconnection inputs of that incumbent system i supported by the communication paths from immediate neighbors j and $j \in \mathcal{N}_{i}$ are viewed as the real-valued functions u _ij(t)dt of the following random processes

$$\displaystyle{ u_{ij}(t)dt = (C_{ij}(t)x_{j}(t) + D_{ij}(t)u_{j}(t))dt + G_{ij}(t)dw_{j}(t)\,,\quad \,j \in \mathcal{N}_{i} }$$

(4)

where continuous-time coefficients $C_{ij} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{r_{i}\times n_{j}})$, $D_{ij} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{r_{i}\times m_{j}})$ and $G_{ij} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{r_{i}\times p_{j}})$ are deterministic matrix-valued functions.

In the state-space representation (3) and (4) one postulates independent Wiener processes $w_{i}(t) \triangleq w_{i}(t,\omega _{i}): [t_{0},t_{f}] \times \Omega _{i}\mapsto {\mathbb{R}}^{p_{i}}$ and $w_{j}(t) \triangleq w_{j}(t,\omega _{j}): [t_{0},t_{f}] \times \Omega _{j}\mapsto {\mathbb{R}}^{p_{j}}$ defined by the underlying filtered probability spaces $(\Omega _{i},\mathcal{F}_{i},\{\mathcal{F}_{i}\}_{t},\mathcal{P}_{i})$ and $(\Omega _{j},\mathcal{F}_{j},\{\mathcal{F}_{j}\}_{t},\mathcal{P}_{j})$ with the correlations of independent increments

$$\displaystyle\begin{array}{rcl} E\left \{[w_{i}(\tau _{1}) - w_{i}(\tau _{2})]{[w_{i}(\tau _{1}) - w_{i}(\tau _{2})]}^{T}\right \}& =& W_{ i}\vert \tau _{1} -\tau _{2}\vert,\quad W_{i} > 0\,,\quad \tau _{1},\tau _{2} \in [t_{0},t_{f}] {}\\ E\left \{[w_{j}(\tau _{1}) - w_{j}(\tau _{2})]{[w_{j}(\tau _{1}) - w_{j}(\tau _{2})]}^{T}\right \}& =& W_{ j}\vert \tau _{1} -\tau _{2}\vert,\quad W_{j} > 0\,,\quad \tau _{1},\tau _{2} \in [t_{0},t_{f}] {}\\ \end{array}$$

approximate the inherent design system uncertainty due to variability and lack of knowledge.

With the local agent dynamics (3) and the intertemporal interactions (4), the recursive dynamics of each interconnected systems that evolve over $[t_{0},t_{f}]$ and capture direct interactions among incumbent agent i and its immediate neighbors j and $j \in \mathcal{N}_{i}$ are now given by

$$\displaystyle{ ds_{i}(t) =\Bigg (A_{i}(t)s_{i}(t) + B_{i}(t)u_{i}(t) +\sum _{ j=1,j\neq i}^{N_{i} }B_{j}(t)u_{j}(t)\Bigg)dt + G_{i}(t)d\xi _{i}(t), }$$

(5)

where for each incumbent agent i, the aggregate Wiener process $\xi _{i} \triangleq {\left [\begin{array}{ccc} w_{1}^{T}&\ldots &w_{N_{i}}^{T} \end{array} \right ]}^{T}$ has the correlations of independent increments

$$\displaystyle{ E\left \{[\xi _{i}(\tau _{1}) -\xi _{i}(\tau _{2})]{[\xi _{i}(\tau _{1}) -\xi _{i}(\tau _{2})]}^{T}\right \} = \Xi _{ i}\vert \tau _{1} -\tau _{2}\vert \,,\quad \forall \,\tau _{1},\tau _{2} \in [t_{0},t_{f}]\,,\quad \Xi _{i} > 0 }$$

whereas for each incumbent agent i, the augmented state variable s _i, its initial-valued condition $s_{i}(t_{0}) = s_{i}^{0}$, the local game coefficients and parameters are defined by

$$\displaystyle\begin{array}{rcl} s_{i}(t)& & \triangleq \!\left [\begin{array}{c} x_{1}(t)\\ \vdots \\ x_{N_{i}}(t) \end{array} \!\right ];\,s_{i}^{0} \triangleq \!\left [\!\begin{array}{c} x_{1}^{0} \\ \vdots \\ x_{N_{i}}^{0} \end{array} \!\right ];\,A_{i} \triangleq \left [\!\begin{array}{cccc} A_{11} & B_{12}C_{12} & \ldots & B_{1N_{i}}C_{1N_{i}} \\ B_{21}C_{21} & A_{22} & \ldots & B_{2N_{i}}C_{2N_{i}}\\ \vdots & \vdots & \ddots & \vdots \\ B_{N_{i}1}C_{N_{i}1} & \ldots & B_{N_{i}(N_{i}-1)}C_{N_{i}(N_{i}-1)} & A_{N_{i}N_{i}} \end{array} \!\right ] {}\\ B_{1}& & \triangleq \left [\begin{array}{c} B_{11} \\ B_{21}D_{21}\\ \vdots \\ B_{N_{i}1}D_{N_{i}1} \end{array} \right ];\quad B_{2} \triangleq \left [\begin{array}{c} B_{12}D_{12} \\ B_{22}\\ \vdots \\ B_{N_{i}2}D_{N_{i}2} \end{array} \right ];\quad B_{N_{i}} \triangleq \left [\begin{array}{c} B_{1N_{i}}D_{1N_{i}}\\ \vdots \\ B_{(N_{i}-1)N_{i}}D_{(N_{i}-1)N_{i}} \\ B_{N_{i}N_{i}} \end{array} \right ] {}\\ G_{i}& & \triangleq \!\left [\begin{array}{cccc} G_{11} & B_{12}G_{12} & \ldots & B_{1N_{i}}G_{1N_{i}} \\ B_{21}G_{21} & G_{22} & \ldots & B_{2N_{i}}G_{2N_{i}}\\ \vdots & \vdots & \ddots & \vdots \\ B_{N_{i}1}G_{N_{i}1} & \ldots & B_{N_{i}(N_{i}-1)}G_{N_{i}(N_{i}-1)} & G_{N_{i}N_{i}} \end{array} \right ];\quad \Xi _{i} \triangleq \!\left [\begin{array}{cccc} W_{1} & 0 & \ldots & 0 \\ 0 &W_{2} & \ldots & 0 \\ \vdots & 0 & \ddots & \vdots\\ 0 & \ldots &0 &W_{ N_{i}} \end{array} \right ]. {}\\ \end{array}$$

Practical situations where self-autonomy is possible require that each agent be able to possess the common knowledge of the parameters associated with potential noncooperative interactions (5). Viewed from the mutual influence of one agent to those of others, self-autonomy preferred by incumbent agent i is therefore described by a surrogate model with the initial value $z_{i}(t_{0}) = z_{i}^{0} = s_{i}^{0}$

$$\displaystyle{ dz_{i}(t) =\Bigg (A_{i}(t)z_{i}(t) + B_{i}(t)u_{i}(t) +\sum _{ j=1,j\neq i}^{N_{i} }B_{j}(t)u_{j}(t)\Bigg)dt + G_{i}(t)d\xi _{i}(t)\,, }$$

(6)

whereby each incumbent agent i and $i \in \mathcal{I}$ can presumably observe all interactions

$$\displaystyle{ \sum _{j=1,j\neq i}^{N_{i} }B_{j}(t)u_{j}(t)dt }$$

from its immediate neighbors that are in turn corrupted by an uncorrelated stationary Wiener measurement noise process. Specifically, the following observations are locally available at incumbent agent i and $t \in [t_{0},t_{f}]$

$$\displaystyle{ u_{-i}(t)dt =\sum _{ j=1,j\neq i}^{N_{i} }B_{j}(t)u_{j}(t)dt + d\eta _{i}(t)\,. }$$

(7)

For the completely decentralizing information pattern, it is also assumed that the incomplete information structure available at each incumbent agent i consists of a linear transformation $C_{i} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{q_{i}\times \sum _{j=1}^{N_{i}}n_{ j}})$ of the states z _i(t) through the local online data $\{y_{i}(\tau ):\tau \in [t_{0},t]\}$

$$\displaystyle{ dy_{i}(t) = C_{i}(t)z_{i}(t)dt + dv_{i}(t)\,. }$$

(8)

Notice that all incumbent agents operate within the common and local environments modeled by the filtered probability spaces. Subsequently, they are then defined by the following uncorrelated stationary Wiener processes adapted for $[t_{0},t_{f}]$ together with the correlations of independent increments for all $\tau _{1},\tau _{2} \in [t_{0},t_{f}]$

$$\displaystyle\begin{array}{rcl} E\left \{[\eta _{i}(\tau _{1}) -\eta _{i}(\tau _{2})]{[\eta _{i}(\tau _{1}) -\eta _{i}(\tau _{2})]}^{T}\right \}& =& I_{ i}\vert \tau _{1} -\tau _{2}\vert {}\\ E\left \{[v_{i}(\tau _{1}) - v_{i}(\tau _{2})]{[v_{i}(\tau _{1}) - v_{i}(\tau _{2})]}^{T}\right \}& =& V _{ i}\vert \tau _{1} -\tau _{2}\vert {}\\ \end{array}$$

whose a-priori second-order statistics I _i and V _i > 0 for $i = 1,\ldots,N$ are assumed known.

At this point, each decentralized filter associated with incumbent agent i and $i \in \mathcal{I}$, whose the output is the conditional mean estimate $\hat{z}_{i}(t)$ of the current state z _i(t) and $t \in [t_{0},t_{f}]$ has the form with the initial-value condition $\hat{z}_{i}(t_{0}) = z_{i}^{0}$

$$\displaystyle{ d\hat{z}_{i}(t) = (A_{i}(t)\hat{z}_{i}(t) + B_{i}(t)u_{i}(t) + u_{-i}(t))dt + L_{i}(t)[dy_{i}(t) - C_{i}(t)\hat{z}_{i}(t)dt]\,, }$$

(9)

whereby the decentralized filter gain L _i(t) and $i \in \mathcal{I}$ is given by

$$\displaystyle{ L_{i}(t) = \Sigma _{i}(t)C_{i}^{T}(t)V _{ i}^{-1} }$$

(10)

and is supported by the estimate-error covariance differential equation with the initial-value condition $\Sigma _{i}(t_{0}) = 0$

$$\displaystyle{ \dot{\Sigma }_{i}(t) = A_{i}(t)\Sigma _{i}(t)\!+\Sigma _{i}(t)A_{i}^{T}\!(t)\!+G_{ i}(t)\Xi _{i}G_{i}^{T}\!(t)\!+I_{ i}-\Sigma _{i}(t)C_{i}^{T}\!(t)V _{ i}^{-1}\!C_{ i}(t)\Sigma _{i}(t). }$$

(11)

Using the definition for the estimate errors $\tilde{z}_{i}(t) \triangleq z_{i}(t) -\hat{ z}_{i}(t)$, it can be shown that

$$\displaystyle\begin{array}{rcl} d\tilde{z}_{i}(t)& =& (A_{i}(t) - L_{i}(t)C_{i}(t))\tilde{z}_{i}(t)dt + G_{i}(t)d\xi _{i}(t) - L_{i}(t)dv_{i}(t) - d\eta _{i}(t) \\ \tilde{z}_{i}(t_{0})& =& 0\, {}\end{array}$$

(12)

incumbent agent i, however, attempts to make risk-bearing decisions u _i from an admissible feedback policy set $\mathcal{U}_{i} \subset L_{\mathcal{F}_{t}^{i}}^{2}(t_{0},t_{f}; {\mathbb{R}}^{m_{i}})$, which is the subset of Hilbert space of ${\mathbb{R}}^{m_{i}}$-valued square integrable processes on $[t_{0},t_{f}]$ that are adapted to the σ-algebra $\mathcal{F}_{t}^{i}$ generated by {y _i(τ): τ ∈ [t ₀, t]} for reliable attainments of payoffs or utilities. Associated with each admissible 2-tuple $(u_{i}(\cdot ),u_{-i}(\cdot ))$ is the generalized chi-squared random measure of performance

$$\displaystyle\begin{array}{rcl} J_{i}(u_{i},u_{-i})& =& z_{i}^{T}(t_{ f})Q_{f}^{i}z_{ i}(t_{f}) \\ & & \quad +\int _{ t_{0}}^{t_{f} }[z_{i}^{T}(\tau )Q_{ i}(\tau )z_{i}(\tau ) + u_{i}^{T}(\tau )R_{ i}(\tau )u_{i}(\tau ) - u_{-i}^{T}(\tau )M_{ i}(\tau )u_{-i}(\tau )]d\tau \,,{}\end{array}$$

(13)

whereby the coefficients $Q_{f}^{i} \in {\mathbb{R}}^{\sum _{j=1}^{N_{i}}n_{ j}\times \sum _{j=1}^{N_{i}}n_{ j}}$, $Q_{i} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{\sum _{j=1}^{N_{i}}n_{ j}\times \sum _{j=1}^{N_{i}}n_{ j}})$, $M_{i} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{\sum _{j=1}^{N_{i}}n_{ j}\times \sum _{j=1}^{N_{i}}n_{ j}})$ and $R_{i} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{m_{i}\times m_{i}})$ representing relative weightings for terminal and transient trade-offs between the regulatory of responses z _i, the effectiveness of the control and/or decision policy u _i and observable variations in the control and/or decision policies of all other neighbors u _− i are deterministic and positive semidefinite with R _i(t) invertible.

Amongst some research issues for distributed control which are currently under investigation is how incumbent agent i for $i \in \mathcal{I}$ and its immediate neighbors j for $j \in \mathcal{N}_{i}$ carry out optimal control and decision synthesis for controlling of distributed stochastic systems. The approach to handle the problem with a tuple of two or more control laws or decision policies is to use the noncooperative game-theoretic paradigm. Particularly, an N _i-tuple policy $\{u_{1}^{{\ast}},u_{2}^{{\ast}},\ldots,u_{N_{i}}^{{\ast}}\}$ is said to constitute a person-by-person equilibrium solution for the distributed control problem (6) and performance measure (13) if

$$\displaystyle{ J_{i}^{{\ast}}\triangleq J_{ i}(u_{i}^{{\ast}},u_{ -i}^{{\ast}}) \leq J_{ i}(u_{i},u_{-i}^{{\ast}})\,,\qquad \forall i \in \mathcal{I}\,. }$$

(14)

That is, none of the N _i agents can deviate unilaterally from the equilibrium policies and gain from doing so. The justification for the restriction to such an equilibrium is that the coalition effects $u_{-i}^{{\ast}}$ being observed by incumbent agent i does not necessarily support its preference optimization. Therefore, they cannot do better than behave as if they strive for this equilibrium. It is reasonable to conclude that a person-by-person equilibrium of distributed control for incumbent agent i and its immediate neighbors $j \in \mathcal{N}_{i}$ is identical to the concept of a Nash equilibrium within a noncooperative game-theoretic setting.

Moreover, the N _i-tuple $(u_{1}^{{\ast}},\ldots,u_{N_{i}}^{{\ast}})$ of decision laws for incumbent agent i and its immediate neighbors j and $j \in \mathcal{N}_{i}$ that is satisfying the person-by-person equilibrium is also a minimal tuple of decision laws. The reasons being are that the input spaces u _i are continuous and criterion J _i are continuous, differentiable, and convex in the inputs u _i. Henceforth, a minimal tuple is obtained if incumbent agents individually optimize their criteria in a parallel fashion. See [8] for more details.

Next, the notion of admissible feedback policy sets is discussed. In the case of incomplete information, an admissible feedback policy u _i for local best response to all other immediate neighbors $u_{-i}^{{\ast}}$ must be of the form, for some $\partial _{i}(\cdot,\cdot )$

$$\displaystyle{ u_{i}(t) = \partial _{i}(t,y_{i}(\tau ))\,,\quad \tau \in [t_{0},t]\,. }$$

(15)

In general, the conditional density $p_{i}(z_{i}(t)\vert \mathcal{F}_{t}^{i})$, which is the density of z _i(t) conditioned on $\mathcal{F}_{t}^{i}$ (i.e., induced by the observation $\{y_{i}(\tau ):\tau \in [t_{0},t]\}$) represents the sufficient statistics for describing the conditional stochastic effects of future feedback policy u _i. Under the Gaussian assumption the conditional density $p_{i}(z_{i}(t)\vert \mathcal{F}_{t}^{i})$ is parameterized by the locally available conditional mean $\hat{z}_{i}(t) \triangleq E\{z_{i}(t)\vert \mathcal{F}_{t}^{i}\}$ and covariance $\Sigma _{i}(t) \triangleq E\{[z_{i}(t) -\hat{ z}_{i}(t)]{[z_{i}(t) -\hat{ z}_{i}(t)]}^{T}\vert \mathcal{F}_{t}^{i}\}$ by incumbent agent i. With respect to the linear-Gaussian conditions, the covariance $\Sigma _{i}(t)$ is independent of feedback policy u _i(t) and observations $\{y_{i}(\tau ):\tau \in [t_{0},t]\}$. Therefore, to look for an optimal control and/or decision policy u _i(t) of the form (15), it is only required that

$$\displaystyle{ u_{i}(t) =\gamma _{i}(t,\hat{z}_{i}(t))\,,\quad t \in [t_{0},t_{f}]\,. }$$

Given the linear-quadratic properties of the surrogate system description (6)–(13), the search for an optimal feedback solution may be productively restricted to a linear time-varying feedback policy generated from the locally accessible state $\hat{z}_{i}(t)$ by

$$\displaystyle{ u_{i}(t) = K_{i}(t)\hat{z}_{i}(t)\,,\quad t \in [t_{0},t_{f}] }$$

(16)

with $K_{i} \in \mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{m_{i}\times \sum _{j=1}^{N_{i}}n_{ j}})$ an admissible feedback form whose further defining properties will be stated shortly.

Hence, for the admissible pair $(t_{0},z_{i}^{0})$, the observed knowledge about neighboring disturbances $u_{-i}^{{\ast}}(\cdot )$ and the admissible feedback policy (16), the aggregation of the dynamics (9) and (12) associated with incumbent agent i is described by the controlled stochastic differential equation

$$\displaystyle{ d{z}^{i}(t) = ({F}^{i}(t){z}^{i}(t) + {E}^{i}(t)u_{ -i}^{{\ast}}(t))dt + {G}^{i}(t)d{w}^{i}(t)\,,\quad {z}^{i}(t_{ 0}) = z_{0}^{i} }$$

(17)

with the performance measure (13) rewritten as follows

$$\displaystyle\begin{array}{rcl} J_{i}(u_{i},u_{-i}^{{\ast}})& =& {({z}^{i})}^{T}(t_{ f})N_{f}^{i}{z}^{i}(t_{ f}) \\ & & +\int _{t_{0}}^{t_{f} }[{({z}^{i})}^{T}(\tau ){N}^{i}(\tau ){z}^{i}(\tau ) - {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau )]d\tau \,,{}\end{array}$$

(18)

whereby for each incumbent agent i and $i \in \mathcal{I}$, the aggregate system states ${z}^{i} \triangleq {\left [\begin{array}{cc} {(\hat{z}_{i})}^{T}&{(\tilde{z}_{i})}^{T} \end{array} \right ]}^{T}$, the stationary Wiener process noise ${w}^{i} \triangleq {\left [\begin{array}{ccc} \xi _{i}^{T}&\eta _{i}^{T}&v_{i}^{T} \end{array} \right ]}^{T}$ with the correlation of independent increments defined as

$$\displaystyle{ E\left \{[{w}^{i}(\tau _{ 1}) - {w}^{i}(\tau _{ 2})]{[{w}^{i}(\tau _{ 1}) - {w}^{i}(\tau _{ 2})]}^{T}\right \} = {W}^{i}\vert \tau _{ 1} -\tau _{2}\vert \,,\quad \forall \,\tau _{1},\tau _{2} \in [t_{0},t_{f}]\,,\,{W}^{i} > 0 }$$

and the aggregate system coefficients are given by, for each $t \in [t_{0},t_{f}]$

$$\displaystyle\begin{array}{rcl} {F}^{i}(t)& & \triangleq \left [\begin{array}{cc} A_{i}(t) + B_{i}(t)K_{i}(t)& L_{i}(t)C_{i}(t) \\ 0 &A_{i}(t) - L_{i}(t)C_{i}(t) \end{array} \right ];\quad {E}^{i}(t) \triangleq \left [\begin{array}{c} I_{\sum _{j=1}^{N_{i}}n_{j}\times \sum _{j=1}^{N_{i}}n_{j}} \\ 0 \end{array} \right ] {}\\ {G}^{i}(t)& & \triangleq \left [\begin{array}{ccc} 0 & 0 & L_{i}(t) \\ G_{i}(t)& - I_{\sum _{j=1}^{N_{i}}n_{j}\times \sum _{j=1}^{N_{i}}n_{j}} & - L_{i}(t) \end{array} \right ];\quad N_{f}^{i} \triangleq \left [\begin{array}{cc} Q_{f}^{i}&Q_{ f}^{i} \\ Q_{f}^{i}&Q_{f}^{i} \end{array} \right ];\quad z_{0}^{i} \triangleq \left [\begin{array}{c} z_{i}^{0} \\ 0\end{array} \right ] {}\\ {N}^{i}(t)& & \triangleq \left [\begin{array}{cc} Q_{i}(t) + K_{i}^{T}(t)R_{ i}(t)K_{i}(t)&Q_{i}(t) \\ Q_{i}(t) &Q_{i}(t) \end{array} \right ];\quad {W}^{i} \triangleq \left [\begin{array}{ccc} \Xi _{i}& 0 & 0 \\ 0 &I_{i}& 0 \\ 0 & 0 &V _{i} \end{array} \right ]\,. {}\\ \end{array}$$

Regarding the linear-quadratic structural constraints (17) and (18), the path-wise performance-measure (18), with which incumbent agent i is risk averse, is clearly a random variable of the generalized chi-squared type. Henceforth, the degree of uncertainty of the path-wise performance-measure (18) must be assessed via a complete set of higher-order statistics beyond the statistical mean or average. In an attempt to describe or model performance uncertainty, the essence of information about these higher-order performance-measure statistics is now considered as a source of information flow, which will affect perception of the problem and the environment at the risk-averse incumbent agent i.

Next, the question of how to characterize and influence performance information is answered by modeling and management of cumulants (also known as semi-invariants) associated with (18) as shown in the following result.

Theorem 1 (Cumulant-Generating Function).

Let each incumbent agent i and $i \in \mathcal{I}$ be associated with the state variable z ⁱ (⋅) of the stochastic dynamics (17) and subject to the performance measure (18) . Further, let initial states ${z}^{i}(\tau ) \equiv z_{\tau }^{i}$ and $\tau \in [t_{0},t_{f}]$ and the moment-generating function be denoted by

$$\displaystyle{{ \varphi }^{i}\left (\tau,z_{\tau }^{i},\theta \right ) {=\varrho }^{i}\left (\tau,\theta \right )\exp \left \{{(z_{\tau }^{i})}^{T}{\Upsilon }^{i}(\tau,\theta )z_{\tau }^{i} + 2{(z_{\tau }^{i}){}^{T}\ell}^{i}(\tau,\theta )\right \} }$$

(19)

$$\displaystyle{{ \upsilon }^{i}\left (\tau,\theta \right ) {=\ln \{\varrho }^{i}\left (\tau,\theta \right )\}\,,\qquad \theta \in {\mathbb{R}}^{+}\,. }$$

(20)

Then, the cumulant-generating function has the form of quadratic affine

$$\displaystyle{{ \psi }^{i}\left (\tau,z_{\tau }^{i},\theta \right ) = {(z_{\tau }^{i})}^{T}{\Upsilon }^{i}(\tau,\theta )z_{\tau }^{i} + 2{(z_{\tau }^{i}){}^{T}\ell}^{i}(\tau,\theta ) {+\upsilon }^{i}\left (\tau,\theta \right )\,, }$$

(21)

where the scalar solution ${\upsilon }^{i}\left (\tau,\theta \right )$ solves the scalar-valued backward-in-time differential equation with the terminal-value condition ${\upsilon }^{i}\left (t_{f},\theta \right ) = 0$

$$\displaystyle\begin{array}{rcl}{ \frac{d} {d\tau }\upsilon }^{i}\left (\tau,\theta \right )& =& -\mathrm{Tr}\left \{{\Upsilon }^{i}(\tau,\theta ){G}^{i}\left (\tau \right ){W}^{i}{({G}^{i})}^{T}\left (\tau \right )\right \} +\theta {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau ){}\end{array}$$

(22)

whereas the matrix ${\Upsilon }^{i}(\tau,\theta )$ and vector ${\ell}^{i}(\tau,\theta )$ solutions satisfy the matrix and vector-valued backward-in-time differential equations

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }{\Upsilon }^{i}(\tau,\theta )& =& -{({F}^{i})}^{T}(\tau ){\Upsilon }^{i}(\tau,\theta ) - {\Upsilon }^{i}(\tau,\theta ){F}^{i}(\tau ) \\ & & - 2{\Upsilon }^{i}(\tau,\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau ){\Upsilon }^{i}(\tau,\theta ) -\theta {N}^{i}(\tau )\,,\quad {\Upsilon }^{i}(t_{ f},\theta ) =\theta N_{f}^{i}{}\end{array}$$

(23)

$$\displaystyle\begin{array}{rcl}{ \frac{d} {d\tau }\ell}^{i}(\tau,\theta )& =& -{\Upsilon }^{i}(\tau,\theta ){E}^{i}(\tau )u_{ -i}^{{\ast}}(\tau )\,,{\quad \ell}^{i}(t_{ f},\theta ) = 0\,.{}\end{array}$$

(24)

Meanwhile, the scalar solution ${\varrho }^{i}(\tau,\theta )$ satisfies the scalar-valued backward-in-time differential equation

$$\displaystyle\begin{array}{rcl}{ \frac{d} {d\tau }\varrho }^{i}\left (\tau,\theta \right )& =& {-\varrho }^{i}\left (\tau,\theta \right )[\mathrm{Tr}\left \{{\Upsilon }^{i}(\tau,\theta ){G}^{i}\left (\tau \right ){W}^{i}{({G}^{i})}^{T}\left (\tau \right )\right \} \\ & & -\theta {(u_{-i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau )]\,,{\quad \varrho }^{i}\left (t_{ f},\theta \right ) = 1\,.{}\end{array}$$

(25)

Proof.

For notional simplicity, it is convenient to have, for each $i \in \mathcal{I}$

$$\displaystyle{{ \varpi }^{i}\left (\tau,z_{\tau }^{i},\theta \right ) \triangleq \exp \left \{\theta J_{ i}\left (\tau,z_{\tau }^{i}\right )\right \}\,, }$$

in which the performance measure (18) is rewritten as the cost-to-go function from an arbitrary state $z_{\tau }^{i}$ at a running time $\tau \in [t_{0},t_{f}]$, that is,

$$\displaystyle\begin{array}{rcl} J_{i}(\tau,z_{\tau }^{i})& =& {({z}^{i})}^{T}(t_{ f})N_{f}^{i}{z}^{i}(t_{ f}) \\ & & +\int _{\tau }^{t_{f} }[{({z}^{i})}^{T}(t){N}^{i}(t){z}^{i}(t) - {(u_{ -i}^{{\ast}})}^{T}(t)M_{ i}(t)u_{-i}^{{\ast}}(t)]dt{}\end{array}$$

(26)

subject to

$$\displaystyle{ d{z}^{i}(t) = ({F}^{i}(t){z}^{i}(t) + {E}^{i}(t)u_{ -i}^{{\ast}}(t))dt + {G}^{i}(t)d{w}^{i}(t)\,,\ \ {z}^{i}(\tau ) = z_{\tau }^{i}\,. }$$

(27)

By definition, the moment-generating function is

$$\displaystyle{{ \varphi }^{i}(\tau,z_{\tau }^{i},\theta ) \triangleq E\left \{{\varpi }^{i}\left (\tau,z_{\tau }^{i},\theta \right )\right \}\,. }$$

Thus, the total time derivative of ${\varphi }^{i}(\tau,z_{\tau }^{i},\theta )$ is obtained as

$$\displaystyle{{ \frac{d} {d\tau }\varphi }^{i}\left (\tau,z_{\tau }^{i},\theta \right ) = -\theta {[{(z_{\tau }^{i})}^{T}{N}^{i}(\tau )z_{\tau }^{i} - {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau )]\varphi }^{i}\left (\tau,z_{\tau }^{i},\theta \right )\,. }$$

Using the standard Ito’s formula, it follows

$$\displaystyle\begin{array}{rcl}{ d\varphi }^{i}\left (\tau,z_{\tau }^{i},\theta \right )& =& E\left \{{d\varpi }^{i}\left (\tau,z_{\tau }^{i},\theta \right )\right \} {}\\ & =& E\Big\{\varpi _{\tau }^{i}(\tau,z_{\tau }^{i},\theta )d\tau +\varpi _{ z_{\tau }^{i}}^{i}(\tau,z_{\tau }^{i},\theta )dz_{\tau }^{i} {}\\ & & \qquad + \frac{1} {2}\mathrm{Tr}\left \{\varpi _{z_{\tau }^{i}z_{\tau }^{i}}^{i}(\tau,z_{\tau }^{i},\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \}d\tau \Big\} {}\\ & =& \varphi _{\tau }^{i}(\tau,z_{\tau }^{i},\theta )d\tau +\varphi _{ z_{\tau }^{i}}^{i}(\tau,z_{\tau }^{i},\theta )({F}^{i}(\tau )z_{\tau }^{i} + {E}^{i}(\tau )u_{ -i}^{{\ast}}(\tau ))d\tau {}\\ & & \quad + \frac{1} {2}\mathrm{Tr}\left \{\varphi _{z_{\tau }^{i}z_{\tau }^{i}}^{i}(\tau,z_{\tau }^{i},\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \}d\tau \,, {}\\ \end{array}$$

which under the definition of the moment-generating function; e.g.,

$$\displaystyle{{ \varphi }^{i}\left (\tau,z_{\tau }^{i},\theta \right ) {=\varrho }^{i}\left (\tau,\theta \right )\exp \left \{{(z_{\tau }^{i})}^{T}{\Upsilon }^{i}(\tau,\theta )z_{\tau }^{i} + 2{(z_{\tau }^{i}){}^{T}\ell}^{i}(\tau,\theta )\right \} }$$

and its partial derivatives lead to the result

$$\displaystyle\begin{array}{rcl} & & -\theta {[{(z_{\tau }^{i})}^{T}{N}^{i}(\tau )z_{\tau }^{i} - {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau )]\varphi }^{i}\left (\tau,z_{\tau },\theta \right ) {}\\ & & \quad =\Big\{{ \frac{{\frac{d} {d\tau }\varrho }^{i}\left (\tau,\theta \right )} {\varrho }^{i}\left (\tau,\theta \right )} + {(z_{\tau }^{i})}^{T}\frac{d} {d\tau }{\Upsilon }^{i}(\tau,\theta )z_{\tau }^{i} + 2{(z_{\tau }^{i})}^{T}{\frac{d} {d\tau }\ell}^{i}(\tau,\theta ) {}\\ & & \qquad \quad + {(z_{\tau }^{i})}^{T}\left [{({F}^{i})}^{T}\left (\tau \right ){\Upsilon }^{i}(\tau,\theta ) + {\Upsilon }^{i}(\tau,\theta ){F}^{i}\left (\tau \right )\right ]z_{\tau }^{i} + 2{(z_{\tau }^{i})}^{T}{\Upsilon }^{i}(\tau,\theta ){E}^{i}(\tau )u_{ -i}^{{\ast}}(\tau ) {}\\ & & \qquad \quad + 2{(z_{\tau }^{i})}^{T}{\Upsilon }^{i}(\tau,\theta ){G}^{i}\left (\tau \right ){W}^{i}{({G}^{i})}^{T}\left (\tau \right ){\Upsilon }^{i}(\tau,\theta )z_{\tau }^{i} {}\\ & & \qquad \quad +\mathrm{ Tr}{\left \{{\Upsilon }^{i}(\tau,\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \}\Big\}\varphi }^{i}\left (\tau,z_{\tau }^{i},\theta \right )\,. {}\\ \end{array}$$

To have constant and quadratic terms be independent of arbitrary $z_{\tau }^{i}$, it requires

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }{\Upsilon }^{i}(\tau,\theta )& =& -{({F}^{i})}^{T}(\tau ){\Upsilon }^{i}(\tau,\theta ) - {\Upsilon }^{i}(\tau,\theta ){F}^{i}(\tau ) -\theta {N}^{i}(\tau ) {}\\ & & - 2{\Upsilon }^{i}(\tau,\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau ){\Upsilon }^{i}(\tau,\theta ) {}\\ { \frac{d} {d\tau }\ell}^{i}(\tau,\theta )& =& -{\Upsilon }^{i}(\tau,\theta ){E}^{i}(\tau )u_{ -i}^{{\ast}}(\tau ) {}\\ { \frac{d} {d\tau }\varrho }^{i}(\tau,\theta )& =& {-\varrho }^{i}\left (\tau,\theta \right )[\mathrm{Tr}\left \{{\Upsilon }^{i}(\tau,\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}\!(\tau )\right \} -\theta {(u_{ -i}^{{\ast}})}^{T}\!(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau )] {}\\ \end{array}$$

with the terminal-value conditions ${\Upsilon }^{i}(t_{f},\theta ) =\theta N_{f}^{i}$ and ${\varrho }^{i}(t_{f},\theta ) = 1$. □

Finally, the backward-in-time differential equation satisfied by the scalar-valued solution ${\upsilon }^{i}(\tau,\theta )$ is obtained with the terminal-value condition ${\upsilon }^{i}(t_{f},\theta ) = 0$

$$\displaystyle{{ \frac{d} {d\tau }\upsilon }^{i}(\tau,\theta ) = -\mathrm{Tr}\left \{{\Upsilon }^{i}(\tau,\theta ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \} +\theta {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau )\,, }$$

which completes the proof.

As it turns out, all the higher-order characteristic distributions associated with performance uncertainty and risk are captured in the higher-order performance-measure statistics associated with (18). Subsequently, higher-order statistics that encapsulate the uncertain nature of (18) can now be generated via a MacLaurin series of the cumulant-generating function or the second characteristic function (21)

$$\displaystyle{{ \psi }^{i}\left (\tau,z_{\tau }^{i},\theta \right ) =\sum _{ r=1}^{\infty }\left.{{{\partial }^{(r)} \over {\partial \theta }^{(r)}}\psi }^{i}(\tau,z_{\tau }^{i},\theta )\right \vert {_{\theta =0}\,{ \theta }^{r} \over r!}\,, }$$

(28)

from which all $\kappa _{r}^{i} \triangleq \left.{{{\partial }^{(r)} \over {\partial \theta }^{(r)}} \psi }^{i}(\tau,z_{\tau }^{i},\theta )\right \vert _{\theta =0}$ are defined as performance-measure statistics associated with incumbent agent i and $i \in \mathcal{I}$. In fact, the rth performance-measure statistic is determined by the series expansion coefficients; that is, it is obtained from the cumulant-generating function (21)

$$\displaystyle\begin{array}{rcl} \kappa _{r}^{i}& =& \left.{{{\partial }^{(r)} \over {\partial \theta }^{(r)}}\psi }^{i}(\tau,z_{\tau }^{i},\theta )\right \vert _{\theta =0} = ({z}^{i})_{\tau }^{T}\left.{{\partial }^{(r)} \over {\partial \theta }^{(r)}}{\Upsilon }^{i}(\tau,\theta )\right \vert _{\theta =0}z_{\tau }^{i} \\ & & \quad + 2{(z_{\tau }^{i})}^{T}\left.{{{\partial }^{(r)} \over {\partial \theta }^{(r)}}\ell}^{i}(\tau,\theta )\right \vert _{\theta =0} + \left.{{{\partial }^{(r)} \over {\partial \theta }^{(r)}}\upsilon }^{i}(\tau,\theta )\right \vert _{\theta =0}\,.{}\end{array}$$

(29)

For notational convenience, the change of variables corresponding to each incumbent agent i and $i \in \mathcal{I}$

$$\displaystyle\begin{array}{rcl} H_{r}^{i}(\tau )& & \triangleq \left.{{\partial }^{(r)}{\Upsilon }^{i}(\tau,\theta ) \over {\partial \theta }^{(r)}} \right \vert _{\theta =0}\,,\quad \tau \in [t_{0},t_{f}]{}\end{array}$$

(30)

$$\displaystyle\begin{array}{rcl} \breve{D}_{r}^{i}(\tau )& & \triangleq \left.{{\partial {}^{(r)}\ell}^{i}(\tau,\theta ) \over {\partial \theta }^{(r)}} \right \vert _{\theta =0};\quad D_{r}^{i}(\tau ) \triangleq \left.{{\partial {}^{(r)}\upsilon }^{i}(\tau,\theta ) \over {\partial \theta }^{(r)}} \right \vert _{\theta =0}{}\end{array}$$

(31)

is introduced so that the next theorem provides an effective and accurate capability for forecasting all the higher-order characteristics associated with performance uncertainty.

Theorem 2 (Performance-Measure Statistics).

Associate with each incumbent agent i and $i \in \mathcal{I}$ the decentralized stochastic system governed by (17) and (18) , wherein the pairs $(A_{i},B_{i})$ and $(A_{i},C_{i})$ are uniformly stabilizable and detectable. For ${k}^{i} \in \mathbb{N}$ fixed, the k ⁱ th cumulant of performance measure (18) concerned by incumbent agent i is given by

$$\displaystyle\begin{array}{rcl} \kappa _{k}^{i} = {(z_{ 0}^{i})}^{T}H_{{ k}^{i}}^{i}(t_{ 0})z_{0}^{i} + 2{(z_{ 0}^{i})}^{T}\breve{D}_{{ k}^{i}}^{i}(t_{ 0}) + D_{{k}^{i}}^{i}(t_{ 0})\,,& &{}\end{array}$$

(32)

where the supporting variables $\{H_{r}^{i}(\tau )\}_{r=1}^{{k}^{i} }$ , $\{\breve{D}_{r}^{i}(\tau )\}_{r=1}^{{k}^{i} }$ and $\{D_{r}^{i}(\tau )\}_{r=1}^{{k}^{i} }$ evaluated at τ = t ₀ satisfy the differential equations (with the dependence of $H_{r}^{i}(\tau )$ , $\breve{D}_{r}^{i}(\tau )$ and D _r (τ) upon the admissible feedback policy gain K _i (τ) and $u_{-i}^{{\ast}}(\tau )$ suppressed)

$$\displaystyle{ \frac{d} {d\tau }H_{1}^{i}(\tau ) = -{({F}^{i})}^{T}(\tau )H_{ 1}^{i}(\tau ) - H_{ 1}^{i}(\tau ){F}^{i}(\tau ) - {N}^{i}(\tau ) }$$

(33)

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }H_{r}^{i}(\tau )& =& -{({F}^{i})}^{T}(\tau )H_{ r}^{i}(\tau ) - H_{ r}^{i}(\tau ){F}^{i}(\tau ) \\ & & -\sum _{s=1}^{r-1} \frac{2r!} {s!(r - s)!}H_{s}^{i}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )H_{ r-s}^{i}(\tau )\,,\quad 2 \leq r \leq {k}^{i}{}\end{array}$$

(34)

and

$$\displaystyle{ \frac{d} {d\tau }\breve{D}_{r}^{i}(\tau ) = -H_{ r}^{i}(\tau ){E}^{i}(\tau )u_{ -i}^{{\ast}}(\tau )\,,\qquad 1 \leq r \leq {k}^{i} }$$

(35)

and, finally,

$$\displaystyle{ \frac{d} {d\tau }D_{1}^{i}(\tau ) = -\mathrm{Tr}\left \{H_{ 1}^{i}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \} + {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau ) }$$

(36)

$$\displaystyle{ \frac{d} {d\tau }D_{r}^{i}(\tau ) = -\mathrm{Tr}\left \{H_{ r}^{i}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \},\quad 2 \leq r \leq {k}^{i} }$$

(37)

whereby the terminal-value conditions $H_{1}^{i}(t_{f}) = N_{f}^{i}$ , $H_{r}^{i}(t_{f}) = 0$ for $2 \leq r \leq {k}^{i}$ , $\breve{D}_{r}^{i}(t_{f}) = 0$ for 1 ≤ r ≤ k ⁱ and $D_{r}^{i}(t_{f}) = 0$ for 1 ≤ r ≤ k ⁱ .

Proof.

The expression of performance-measure statistics described in (32) is readily justified by using result (29) and definition (30)–(31). What remains is to show that the solutions $H_{r}^{i}(\tau )$, $\breve{D}_{r}^{i}(\tau )$ and $D_{r}^{i}(\tau )$ for 1 ≤ r ≤ k ⁱ indeed satisfy the dynamical equations (33)–(37). Notice that these backward-in-time equations (33)–(37) satisfied by the matrix-valued $H_{r}^{i}(\tau )$, vector-valued $\breve{D}_{r}^{i}(\tau )$, and scalar-valued $D_{r}^{i}(\tau )$ solutions are then obtained by successively taking derivatives with respect to θ of the supporting equations (22)–(24) and subject to the assumptions of $(A_{i},B_{i})$ and $(A_{i},C_{i})$ being uniformly stabilizable and detectable on $[t_{0},t_{f}]$. □

3 Problem Statements

The purpose of this section is to make use of increased insight into the roles played by performance-measure statistics on the generalized chi-squared performance measure (18) for risk-averse Nash feedback strategies. The distributed optimization with Nash feedback policy here is distinguished by the fact that the evolution in time of all mathematical statistics (32) associated with the random performance measure (18) of the generalized chi-squared type is described by means of the matrix/vector/scalar-valued backward-in-time differential equations (33)–(37).

For such problems it is important to have a compact statement of the risk-averse decision and control optimization so as to aid mathematical manipulation. To make this more precise, one may think of the k ⁱ-tuple state variables ${\mathcal{H}}^{i}(\cdot ) \triangleq (\mathcal{H}_{1}^{i}(\cdot ),\ldots,\mathcal{H}_{{k}^{i}}^{i}(\cdot ))$, $\breve{{\mathcal{D}}}^{i}(\cdot ) \triangleq (\breve{\mathcal{D}}_{1}^{i}(\cdot ),\ldots,\breve{\mathcal{D}}_{{k}^{i}}^{i}(\cdot ))$ and ${\mathcal{D}}^{i}(\cdot ) \triangleq (\mathcal{D}_{1}^{i}(\cdot ),\ldots,\mathcal{D}_{{k}^{i}}^{i}(\cdot ))$ whose continuously differentiable states $\mathcal{H}_{r}^{i} \in {\mathcal{C}}^{1}(t_{0},t_{f}; {\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})$, $\breve{\mathcal{D}}_{r}^{i} \in {\mathcal{C}}^{1}(t_{0},t_{f}; {\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})$ and $\mathcal{D}_{r}^{i} \in {\mathcal{C}}^{1}(t_{0},t_{f}; \mathbb{R})$ having the representations $\mathcal{H}_{r}^{i}(\cdot ) \triangleq H_{r}^{i}(\cdot )$, $\breve{\mathcal{D}}_{r}^{i}(\cdot ) \triangleq \breve{ D}_{r}^{i}(\cdot )$ and $\mathcal{D}_{r}^{i}(\cdot ) \triangleq D_{r}^{i}(\cdot )$ with the right members satisfying the dynamics (33)–(37) are defined on $[t_{0},t_{f}]$. In the remainder of the development, the convenient mappings associated with incumbent agent i and $i \in \mathcal{I}$ are introduced as follows

$$\displaystyle\begin{array}{rcl} \mathcal{F}_{r}^{i}: [t_{ 0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } \times {\mathbb{R}}^{m_{i}\times \sum _{j=1}^{N_{i}}n_{ j}}& & \mapsto {\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}} {}\\ \breve{\mathcal{G}}_{r}^{i}: [t_{ 0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }& & \mapsto {\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}} {}\\ \mathcal{G}_{r}^{i}: [t_{ 0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }& & \mapsto \mathbb{R}\,, {}\\ \end{array}$$

where the rules of action are given by

$$\displaystyle\begin{array}{rcl} \mathcal{F}_{1}^{i}(\tau,{\mathcal{H}}^{i},K_{ i})& & \triangleq -{({F}^{i})}^{T}(\tau )\mathcal{H}_{ 1}^{i}(\tau ) -\mathcal{H}_{ 1}^{i}(\tau ){F}^{i}(\tau ) - {N}^{i}(\tau ) {}\\ \mathcal{F}_{r}^{i}(\tau,{\mathcal{H}}^{i},K_{ i})& & \triangleq -{({F}^{i})}^{T}(\tau )\mathcal{H}_{ r}^{i}(\tau ) -\mathcal{H}_{ r}^{i}(\tau ){F}^{i}(\tau ) {}\\ & & \quad -\sum _{s=1}^{r-1}{ 2r! \over s!(r - s)!}\mathcal{H}_{s}^{i}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\mathcal{H}_{ r-s}^{i}(\tau )\,,\quad 2 \leq r \leq {k}^{i} {}\\ \breve{\mathcal{G}}_{r}^{i}(\tau,{\mathcal{H}}^{i})& & \triangleq -\mathcal{H}_{ r}^{i}(\tau ){E}^{i}(\tau )u_{ -i}^{{\ast}}(\tau )\,,\quad 1 \leq r \leq {k}^{i} {}\\ \mathcal{G}_{1}^{i}(\tau,{\mathcal{H}}^{i})& & \triangleq -\mathrm{Tr}\left \{\mathcal{H}_{ r}^{1}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \} + {(u_{ -i}^{{\ast}})}^{T}(\tau )M_{ i}(\tau )u_{-i}^{{\ast}}(\tau ) {}\\ \mathcal{G}_{r}^{i}(\tau,{\mathcal{H}}^{i})& & \triangleq -\mathrm{Tr}\left \{\mathcal{H}_{ r}^{i}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\right \},\quad 2 \leq r \leq {k}^{i}. {}\\ \end{array}$$

The product mappings that follow are necessary for a compact formulation; e.g.,

$$\displaystyle\begin{array}{rcl} \mathcal{F}_{1}^{i} \times \cdots \times \mathcal{F}_{{ k}^{i}}^{i}: [t_{ 0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }\! \times \! {\mathbb{R}}^{m_{i}\times \sum _{j=1}^{N_{i}}n_{ j}}& & \mapsto {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } {}\\ \breve{\mathcal{G}}_{1}^{i} \times \cdots \times \breve{\mathcal{G}}_{{ k}^{i}}^{i}: [t_{ 0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }& & \mapsto {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } {}\\ \mathcal{G}_{1}^{i} \times \cdots \times \mathcal{G}_{{ k}^{i}}^{i}: [t_{ 0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }& & \mapsto {\mathbb{R}}^{{k}^{i} } {}\\ \end{array}$$

whereby the corresponding notations

$$\displaystyle\begin{array}{rcl}{ \mathcal{F}}^{i}& & \triangleq \mathcal{F}_{ 1}^{i} \times \cdots \times \mathcal{F}_{{ k}^{i}}^{i} {}\\ \breve{{\mathcal{G}}}^{i}& & \triangleq \breve{\mathcal{G}}_{ 1}^{i} \times \cdots \times \breve{\mathcal{G}}_{{ k}^{i}}^{i} {}\\ {\mathcal{G}}^{i}& & \triangleq \mathcal{G}_{ 1}^{i} \times \cdots \times \mathcal{G}_{{ k}^{i}}^{i} {}\\ \end{array}$$

are used. Thus, the dynamical equations (33)–(37) can be rewritten as follows

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }{\mathcal{H}}^{i}(\tau )& = {\mathcal{F}}^{i}(\tau,{\mathcal{H}}^{i}(\tau ),K_{ i}(\tau )),\qquad {\mathcal{H}}^{i}(t_{ f}) \equiv \mathcal{H}_{f}^{i}&{}\end{array}$$

(38)

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }\breve{{\mathcal{D}}}^{i}(\tau )& =\breve{ {\mathcal{G}}}^{i}\left (\tau,{\mathcal{H}}^{i}(\tau )\right ),\qquad \breve{{\mathcal{D}}}^{i}(t_{ f}) \equiv \breve{\mathcal{D}}_{f}^{i}&{}\end{array}$$

(39)

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }{\mathcal{D}}^{i}(\tau )& = {\mathcal{G}}^{i}\left (\tau,{\mathcal{H}}^{i}(\tau )\right ),\qquad {\mathcal{D}}^{i}(t_{ f}) \equiv \mathcal{D}_{f}^{i}&{}\end{array}$$

(40)

whereby the k ⁱ-tuple terminal-value conditions $\mathcal{H}_{f}^{i} \triangleq (N_{f}^{i},0,\ldots,0)$, $\breve{\mathcal{D}}_{f}^{i} \triangleq (0,\ldots,0)$ and $\mathcal{D}_{f}^{i} \triangleq (0,\ldots,0)$.

Once immediate neighbors $j \in \mathcal{N}_{i}$ of incumbent agent i fix the control and decision parameters $K_{j}^{{\ast}}$ of the person-by-person equilibrium strategies $u_{j}^{{\ast}}$ and thus the interconnection effects $u_{-i}^{{\ast}}$ underpinned by $K_{-i}^{{\ast}}$, incumbent agent i therefore obtains an optimal stochastic control problem with risk-averse performance considerations. The construction of agent i’s person-by-person policy also involves the control and decision parameter K _i. In the sequel and elsewhere, when the dependence on K _i and $K_{-i}^{{\ast}}$ is needed to be clear, then the notations

$$\displaystyle{ {\mathcal{H}}^{i} \equiv {\mathcal{H}}^{i}(\cdot,K_{ i},K_{-i}^{{\ast}}) }$$

$$\displaystyle{ \breve{{\mathcal{D}}}^{i} \equiv \breve{{\mathcal{D}}}^{i}(\cdot,K_{ i},K_{-i}^{{\ast}}) }$$

$$\displaystyle{ {\mathcal{D}}^{i} \equiv {\mathcal{D}}^{i}(\cdot,K_{ i},K_{-i}^{{\ast}}) }$$

should be used to denote the solution trajectories of the dynamics (38)–(40) with the admissible 2-tuple $(K_{i},K_{-i}^{{\ast}})$.

For the given terminal data $(t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i})$, the class of admissible feedback gains employed by incumbent agent i and $i \in \mathcal{I}$ is next defined.

Definition 1 (Admissible Feedback Policy Gains).

Let compact subset ${\overline{K}}^{i} \subset {\mathbb{R}}^{m_{i}\times n}$ be the set of allowable feedback form values. For the given ${k}^{i} \in \mathbb{N}$ and sequence ${\mu }^{i} =\{\mu _{ r}^{i} \geq 0\}_{r=1}^{{k}^{i} }$ with $\mu _{1}^{i} > 0$, the set of feedback gains $\mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ is assumed to be the class of $\mathcal{C}(t_{0},t_{f}; {\mathbb{R}}^{m_{i}\times \sum _{j=1}^{N_{i}}n_{ j}})$ with values $K_{i}(\cdot ) \in {\overline{K}}^{i}$, for which the solutions to the dynamical equations (38)–(40) with the terminal-value conditions ${\mathcal{H}}^{i}(t_{f}) = \mathcal{H}_{f}^{i}$, $\breve{{\mathcal{D}}}^{i}(t_{f}) =\breve{ \mathcal{D}}_{f}^{i}$ and ${\mathcal{D}}^{i}(t_{f}) = \mathcal{D}_{f}^{i}$ exist on the interval of optimization $[t_{0},t_{f}]$.

One way to make sense of risk bearing existing at incumbent agent i is to identify performance vulnerability of (18) against all the sample-path realizations from the local environment and potential noncooperative influences $u_{-i}^{{\ast}}$ from immediate neighbors j and $j \in \mathcal{N}_{i}$. The mechanism identified here that is under a finite set of selective weights associated with the mathematical statistics of (18) helps to unfold the complexity behind observed performance values and risks of person-by-person strategy dependence in the following formulation of a risk-value aware performance index. Notice that this custom set of design freedoms representing particular uncertainty aversions is hence different from the ones with aversion to risk captured in risk-sensitive optimal control [9, 10].

On $\mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ the performance index with risk-value considerations in risk-averse decision making is subsequently defined as follows.

Definition 2 (Risk-Value Aware Performance Index).

Let incumbent agent i and $i \in \mathcal{I}$ select ${k}^{i} \in \mathbb{N}$ and the sequence of scalar coefficients ${\mu }^{i} =\{\mu _{ r}^{i} \geq 0\}_{r=1}^{{k}^{i} }$ with $\mu _{1}^{i} > 0$. Then, the risk-value aware performance index

$$\displaystyle{ \phi _{0}^{i}:\{ t_{ 0}\} \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } \times {\mathbb{R}}^{{k}^{i} }\mapsto {\mathbb{R}}^{+} }$$

pertaining to risk-averse decision making of the stochastic Nash game over $[t_{0},t_{f}]$ is defined by

$$\displaystyle\begin{array}{rcl} \phi _{0}^{i}(t_{ 0},{\mathcal{H}}^{i}(t_{ 0}),\breve{{\mathcal{D}}}^{i}(t_{ 0}),{\mathcal{D}}^{i}(t_{ 0}))& & \triangleq \underbrace{\mathop{\mu _{1}^{i}\kappa _{ 1}^{i}}}\limits _{ \text{ Value Measure }} +\underbrace{\mathop{ \mu _{2}^{i}\kappa _{ 2}^{i} +\ldots +\mu _{{ k}^{i}}^{i}\kappa _{{ k}^{i}}^{i}}}\limits _{ \text{ Risk\ Measures }} \\ & & =\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[{(z_{ 0}^{i})}^{T}\mathcal{H}_{ r}^{i}(t_{ 0})z_{0}^{i} + 2{(z_{ 0}^{i})}^{T}\breve{\mathcal{D}}_{ r}^{i}(t_{ 0}) + \mathcal{D}_{r}^{i}(t_{ 0})]\,,{}\end{array}$$

(41)

where additional design freedom by means of $\mu _{r}^{i}$’s utilized by incumbent agent i with risk-averse attitudes are sufficient to meet and exceed different levels of performance-based reliability requirements, for instance, mean (i.e., the average of performance measure), variance (i.e., the dispersion of values of performance measure around its mean), skewness (i.e., the anti-symmetry of the density of performance measure), kurtosis (i.e., the heaviness in the density tails of performance measure), etc., pertaining to closed-loop performance variations and uncertainties while the supporting solutions $\{\mathcal{H}_{r}^{i}(\tau )\}_{r=1}^{{k}^{i} }$, $\{\breve{\mathcal{D}}_{r}^{i}(\tau )\}_{r=1}^{{k}^{i} }$ and $\{\mathcal{D}_{r}^{i}(\tau )\}_{r=1}^{{k}^{i} }$ evaluated at τ = t ₀ satisfy the dynamical equations (38)–(40).

To specifically indicate the dependence of the risk-value aware performance index (41) expressed in Mayer form on K _i and the signaling effects $u_{-i}^{{\ast}}$ or $K_{-i}^{{\ast}}$ issued by all immediate neighbors j from $\mathcal{N}_{i}$, the multi-attribute utility function or performance index (41) for incumbent agent i is now rewritten explicitly as $\phi _{0}^{i}(K_{i},K_{-i}^{{\ast}})$.

Definition 3 (Nash Equilibrium Solution).

An admissible set of feedback strategies $(K_{i}^{{\ast}},\ldots,K_{N_{i}}^{{\ast}})$ is a Nash equilibrium for the local N _i-person game, where each incumbent agent i and $i \in \mathcal{I}$ has the performance index $\phi _{0}^{i}(K_{i},K_{-i}^{{\ast}})$ of Mayer type, if for all admissible feedback strategies $(K_{1},\ldots,K_{N_{i}})$ the inequalities hold

$$\displaystyle{ \phi _{0}^{i}(K_{ i}^{{\ast}},K_{ -i}^{{\ast}}) \leq \phi _{ 0}^{i}(K_{ i},K_{-i}^{{\ast}})\,. }$$

For the sake of time consistency and subgame perfection, a Nash equilibrium solution is required to have an additional property that its restriction on the interval [t ₀, τ] is also a Nash solution to the truncated version of the original problem, defined on [t ₀, τ]. With such a restriction so defined, the Nash equilibrium solution is now termed as a feedback Nash equilibrium solution, which is now free of any informational nonuniqueness, and thus whose derivation allows a dynamic programming type argument.

Definition 4 (Feedback Nash Equilibrium).

Let $K_{i}^{{\ast}}$ constitute a feedback Nash strategy which will be implemented by incumbent agent i such that

$$\displaystyle{ \phi _{0}^{i}(K_{ i}^{{\ast}},K_{ -i}^{{\ast}}) \leq \phi _{ 0}^{i}(K_{ i},K_{-i}^{{\ast}})\,,\qquad i \in \mathcal{I} }$$

(42)

for all admissible $K_{i} \in \mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$, upon which the solutions to the dynamical systems (38)–(40) exist on $[t_{0},t_{f}]$.

Then, $\left (K_{1}^{{\ast}},\ldots,K_{N_{i}}^{{\ast}}\right )$ when restricted to the interval [t ₀, τ] is still an N _i-tuple feedback Nash equilibrium solution for the multiperson Nash decision problem with the appropriate terminal-value condition $(\tau,\mathcal{H}_{{\ast}}^{i}(\tau ),\breve{\mathcal{D}}_{{\ast}}^{i}(\tau ),\mathcal{D}_{{\ast}}^{i}(\tau ))$ for all τ ∈ [t ₀, t _f].

In conformity with the rigorous formulation of dynamic programming, the following development is important. Let the terminal time t _f and 3-tuple states $(\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i})$, the other end condition involved the initial time t ₀ and 3-tuple states $(\mathcal{H}_{0}^{i},\breve{\mathcal{D}}_{0}^{i},\mathcal{D}_{0}^{i})$ be specified by a target set requirement.

Definition 5 (Target Sets).

$(t_{0},\mathcal{H}_{0}^{i},\breve{\mathcal{D}}_{0}^{i},\mathcal{D}_{0}^{i}) \in \hat{{\mathcal{M}}}^{i}$, where the target set $\hat{{\mathcal{M}}}^{i}$ residing at incumbent agent i and $i \in \mathcal{I}$ is a closed subset of $[t_{0},t_{f}] \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } \times {\mathbb{R}}^{{k}^{i} }$.

Now, the decision optimization residing at incumbent agent i and $i \in \mathcal{I}$ is to minimize the risk-value aware performance index (41) over all admissible feedback strategies $K_{i} = K_{i}(\cdot )$ in $\mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ while subject to potential interferences from all immediate neighbors with the feedback Nash policies $K_{-i}^{{\ast}}$.

Definition 6 (Optimization of Mayer Problem).

Given the sequence of scalars ${\mu }^{i} =\{\mu _{ r}^{i} \geq 0\}_{r=1}^{{k}^{i} }$ with $\mu _{1}^{i} > 0$, the decision optimization on $[t_{0},t_{f}]$ associated with incumbent agent i and $i \in \mathcal{I}$ is given by

$$\displaystyle{ \min _{K_{i}(\cdot )\in \mathcal{K}_{ t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}}\phi _{0}^{i}(K_{ i},K_{-i}^{{\ast}})\,, }$$

(43)

subject to the dynamical equations (38)–(40) on $[t_{0},t_{f}]$.

Notice that the optimization considered here is in Mayer form and can be solved by applying an adaptation of the Mayer form verification results as given in [11]. To embed this optimization facing incumbent agent i into a larger problem, the terminal time and states $(t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i})$ are parameterized as $(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$, whereby ${\mathcal{Y}}^{i} \triangleq {\mathcal{H}}^{i}(\varepsilon )$, $\breve{{\mathcal{Z}}}^{i} \triangleq \breve{{\mathcal{D}}}^{i}(\varepsilon )$ and ${\mathcal{Z}}^{i} \triangleq {\mathcal{D}}^{i}(\varepsilon )$. Thus, the value function for this optimization problem is now depending on the parameterization of terminal-value conditions.

Definition 7 (Value Function).

Suppose $(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}) \in [t_{0},t_{f}] \times {({\mathbb{R}{}^{2\sum _{j=1}N_{i}n_{j}}}^{\times }2\sum _{j=1}^{N_{i}}n_{j})}^{{k}^{i} } \times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} } \times {\mathbb{R}}^{{k}^{i} }$ is given and fixed. Then, the value function ${\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ and $i \in \mathcal{I}$ is defined by

$$\displaystyle\begin{array}{rcl}{ \mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}) \triangleq \begin{array}{c} \inf \\ K_{ i}(\cdot ) \in \mathcal{K}_{\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}{;\mu }^{i}}^{i} \end{array} \phi _{0}^{i}(K_{ i},K_{-i}^{{\ast}})\,.& & {}\\ \end{array}$$

For convention, ${\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}) \triangleq \infty $ when $\mathcal{K}_{\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}{;\mu }^{i}}^{i}$ is empty. Next, some candidates for the value function are constructed with the help of the concept of reachable set.

Definition 8 (Reachable Sets).

Let reachable set associated with incumbent agent i be ${\mathcal{Q}}^{i} \triangleq \{ (\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}) \in [t_{0},t_{f}]\times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }\times {({\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})}^{{k}^{i} }\times {\mathbb{R}}^{{k}^{i} }$ such that $\mathcal{K}_{\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}{;\mu }^{i}}^{i}\neq \varnothing \}$.

Moreover, it can be shown that the value function associated with incumbent agent i is satisfying a partial differential equation at interior points of ${\mathcal{Q}}^{i}$, at which it is differentiable.

Theorem 3 (Hamilton–Jacobi–Bellman (HJB) Equation-Mayer Problem).

Let $(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ be any interior point of the reachable set ${\mathcal{Q}}^{i}$ and $i \in \mathcal{I}$ , at which the value function ${\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ is differentiable. If there exists a feedback Nash strategy $K_{i}^{{\ast}}\in \mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ , then the differential equation

$$\displaystyle\begin{array}{rcl} 0& =& \min _{K_{ i}\in {\overline{K}}^{i}}\Bigg\{{\partial \, \over \partial \varepsilon }{\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}) \\ & & \qquad \qquad +{ \partial \over \partial \,\mathrm{vec}({\mathcal{Y}}^{i})}{\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})\mathrm{vec}({\mathcal{F}}^{i}(\varepsilon,{\mathcal{Y}}^{i},K_{ i})) \\ & & \qquad \qquad +{ \partial \over \partial \,\mathrm{vec}(\breve{{\mathcal{Z}}}^{i})}{\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})\mathrm{vec}({\breve{\mathcal{G}}}^{i}(\varepsilon,{\mathcal{Y}}^{i}) \\ & & \qquad \qquad +{ \partial \over \partial \,\mathrm{vec}({\mathcal{Z}}^{i})}{\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})\mathrm{vec}({\mathcal{G}}^{i}(\varepsilon,{\mathcal{Y}}^{i})\Bigg\} {}\end{array}$$

(44)

is satisfied whereby ${\mathcal{V}}^{i}(t_{0},{\mathcal{Y}}^{i}(t_{0}),{\breve{\mathcal{Z}}}^{i}(t_{0}),{\mathcal{Z}}^{i}(t_{0})) =\phi _{ 0}^{i}({\mathcal{H}}^{i}(t_{0}),{\breve{\mathcal{D}}}^{i}(t_{0}),{\mathcal{D}}^{i}(t_{0}))$ .

Proof.

By what have been shown in the recent results by the first author [12], the proof for the result herein is readily proven.

Finally, the following result gives the sufficient condition used to verify a feedback Nash strategy for incumbent agent i and $i \in \mathcal{I}$. □

Theorem 4 (Verification Theorem).

Let ${\mathcal{W}}^{i}(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ associated with incumbent agent i and $i \in \mathcal{I}$ be continuously differentiable solution of the HJB equation (44), which satisfies the following boundary condition

$$\displaystyle{ {\mathcal{W}}^{i}(t_{ 0},{\mathcal{H}}^{i}(t_{ 0}),{\breve{\mathcal{D}}}^{i}(t_{ 0}),{\mathcal{D}}^{i}(t_{ 0})) =\phi _{ 0}^{i}(t_{ 0},{\mathcal{H}}^{i}(t_{ 0}),{\breve{\mathcal{D}}}^{i}(t_{ 0}),{\mathcal{D}}^{i}(t_{ 0}))\,. }$$

Let $(t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}) \in {\mathcal{Q}}^{i}$ ; let $K_{i} \in \mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ ; and let $({\mathcal{H}}^{i}(\cdot ),{\breve{\mathcal{D}}}^{i}(\cdot ),{\mathcal{D}}^{i}(\cdot ))$ be the trajectory solutions of the dynamical equations (38)–(40). Then, the scalar-valued function ${\mathcal{W}}^{i}(\tau,{\mathcal{H}}^{i}(\tau ),{\breve{\mathcal{D}}}^{i}(\tau ),{\mathcal{D}}^{i}(\tau ))$ is time-backward increasing function of τ and $\tau \in [t_{0},t_{f}]$ .

If $K_{i}^{{\ast}}$ is in $\mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ with the corresponding solutions $(\mathcal{H}_{{\ast}}^{i}(\cdot ),\breve{\mathcal{D}}_{{\ast}}^{i}(\cdot ),\mathcal{D}_{{\ast}}^{i}(\cdot ))$ of the dynamical equations (38)–(40) such that, for $\tau \in [t_{0},t_{f}]$

$$\displaystyle\begin{array}{rcl} 0& =&{ \partial \, \over \partial \varepsilon }{\mathcal{W}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ),\breve{\mathcal{D}}_{ {\ast}}^{i}(\tau ),\mathcal{D}_{ {\ast}}^{i}(\tau )) \\ & & \quad +{ \partial \over \partial \,\mathrm{vec}({\mathcal{Y}}^{i})}{\mathcal{W}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ),\breve{\mathcal{D}}_{ {\ast}}^{i}(\tau ),\mathcal{D}_{ {\ast}}^{i}(\tau ))\mathrm{vec}({\mathcal{F}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ),K_{ i}^{{\ast}}(\tau ))) \\ & & \quad +{ \partial \over \partial \,\mathrm{vec}(\breve{{\mathcal{Z}}}^{i})}{\mathcal{W}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ),\breve{\mathcal{D}}_{ {\ast}}^{i}(\tau ),\mathcal{D}_{ {\ast}}^{i}(\tau ))\mathrm{vec}({\breve{\mathcal{G}}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ))) \\ & & \quad +{ \partial \over \partial \,\mathrm{vec}({\mathcal{Z}}^{i})}{\mathcal{W}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ),\breve{\mathcal{D}}_{ {\ast}}^{i}(\tau ),\mathcal{D}_{ {\ast}}^{i}(\tau ))\mathrm{vec}({\mathcal{G}}^{i}(\tau,\mathcal{H}_{ {\ast}}^{i}(\tau ))) {}\end{array}$$

(45)

then, $K_{i}^{{\ast}}$ is a feedback Nash strategy in $\mathcal{K}_{t_{f},\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i}{;\mu }^{i}}^{i}$ ,

$$\displaystyle{ {\mathcal{W}}^{i}(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i}) = {\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})\,, }$$

(46)

where ${\mathcal{V}}^{i}(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ is the value function associated with incumbent agent i.

Proof.

With the aid of the recent development in [12], the proof then follows for the verification theorem herein. □

4 Person-by-Person Equilibrium Strategies

The aim of the present section is to recognize the optimization problem of Mayer form existing at incumbent agent i and $i \in \mathcal{I}$, which can therefore be solved by an adaptation of the Mayer-form verification theorem. To this end the terminal time and states $(\varepsilon,\mathcal{H}_{f}^{i},\breve{\mathcal{D}}_{f}^{i},\mathcal{D}_{f}^{i})$ of the dynamics (38)–(40) are now parameterized as $(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ for a broader family of optimization problems.

To apply properly the dynamic programming approach based on the HJB mechanism, together with the verification result, the solution procedure should be formulated as follows. For any given interior point $(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ of the reachable set ${\mathcal{Q}}^{i}$ and $i \in \mathcal{I}$, at which the following real-valued function is considered as a candidate solution ${\mathcal{W}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ to the HJB equation (44). Because the initial state $z_{0}^{i}$, which is arbitrarily fixed represents both quadratic and linear contributions to the performance index (41) of Mayer type, it is therefore concluded that the value function is linear and quadratic in $z_{0}^{i}$. Thus, a candidate function ${\mathcal{W}}^{i} \in {\mathcal{C}}^{1}(t_{0},t_{f}; \mathbb{R})$ for the value function is of the form

$$\displaystyle\begin{array}{rcl}{ \mathcal{W}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})& =& {(z_{ 0}^{i})}^{T}\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}(\mathcal{Y}_{ r}^{i} + \mathcal{E}_{ r}^{i}(\varepsilon ))\,z_{ 0}^{i} \\ & & +2{(z_{0}^{i})}^{T}\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}(\breve{\mathcal{Z}}_{ r}^{i} +\breve{ \mathcal{T}}_{ r}^{i}(\varepsilon )) +\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}(\mathcal{Z}_{ r}^{i} + \mathcal{T}_{ r}^{i}(\varepsilon )){}\end{array}$$

(47)

whereby the parametric functions of time $\mathcal{E}_{r}^{i} \in {\mathcal{C}}^{1}(t_{0},t_{f}; {\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}\times 2\sum _{j=1}^{N_{i}}n_{ j}})$, $\breve{\mathcal{T}}_{r}^{i} \in {\mathcal{C}}^{1}(t_{0},t_{f}; {\mathbb{R}}^{2\sum _{j=1}^{N_{i}}n_{ j}})$, and $\mathcal{T}_{r}^{i} \in {\mathcal{C}}^{1}([t_{0},t_{f}]; \mathbb{R})$ are yet to be determined.

Moreover, it can be shown that the derivative of ${\mathcal{W}}^{i}(\varepsilon,{\mathcal{Y}}^{i},{\breve{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})$ with respect to time $\varepsilon$ is

$$\displaystyle\begin{array}{rcl}{ d \over d\varepsilon }{\mathcal{W}}^{i}(\varepsilon,{\mathcal{Y}}^{i},\breve{{\mathcal{Z}}}^{i},{\mathcal{Z}}^{i})& =& {(z_{ 0}^{i})}^{T}\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[\mathcal{F}_{ r}^{i}(\varepsilon,{\mathcal{Y}}^{i},K_{ i}) +{ d \over d\varepsilon }\mathcal{E}_{r}^{i}(\varepsilon )]z_{ 0}^{i} \\ & & \quad + 2{(z_{0}^{i})}^{T}\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[\breve{\mathcal{G}}_{ r}^{i}(\varepsilon,{\mathcal{Y}}^{i}\big) +{ d \over d\varepsilon }\breve{\mathcal{T}}_{r}^{i}(\varepsilon )] \\ & & \quad +\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[\mathcal{G}_{ r}^{i}(\varepsilon,{\mathcal{Y}}^{i}) +{ d \over d\varepsilon }\mathcal{T}_{r}^{i}(\varepsilon )]\,. {}\end{array}$$

(48)

The substitution of this candidate (47) for the value function into the HJB equation (44) and making use of (48) yield

$$\displaystyle\begin{array}{rcl} 0& =& \min _{K_{ i}\in {\overline{K}}^{i}}\Big\{{(z_{0}^{i})}^{T}\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[\mathcal{F}_{ r}^{i}(\varepsilon,{\mathcal{Y}}^{i},K_{ i}) +{ d \over d\varepsilon }\mathcal{E}_{r}^{i}(\varepsilon )]z_{ 0}^{i} \\ & & \quad + 2{(z_{0}^{i})}^{T}\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[\breve{\mathcal{G}}_{ r}^{i}(\varepsilon,{\mathcal{Y}}^{i}\big) +{ d \over d\varepsilon }\breve{\mathcal{T}}_{r}^{i}(\varepsilon )] +\sum _{ r=1}^{{k}^{i} }\mu _{r}^{i}[\mathcal{G}_{ r}^{i}(\varepsilon,{\mathcal{Y}}^{i}) +{ d \over d\varepsilon }\mathcal{T}_{r}^{i}(\varepsilon )]\Big\}.{}\end{array}$$

(49)

Now the aggregate matrix coefficients F ⁱ( ⋅) and N ⁱ( ⋅) of the aggregate dynamics (17) are partitioned to conform with the n-dimensional structure of (6) by means of

$$\displaystyle{ I_{0}^{T} \triangleq \left [\begin{array}{cc} I &0 \end{array} \right ]\,,\qquad I_{ 1}^{T} \triangleq \left [\begin{array}{cc} 0&I \end{array} \right ]\,, }$$

where I is an $\sum _{j=1}^{N_{i}}n_{j} \times \sum _{j=1}^{N_{i}}n_{j}$ identity matrix and

$$\displaystyle{{ F}^{i}(\cdot ) = I_{ 0}(A_{i}(\cdot )+B_{i}(\cdot )K_{i}(\cdot ))I_{0}^{T}+I_{ 0}L_{i}(\cdot )C_{i}(\cdot )I_{1}^{T}+I_{ 1}(A_{i}(\cdot )-L_{i}(\cdot )C_{i}(\cdot ))I_{1}^{T} }$$

(50)

$$\displaystyle{{ N}^{i}(\cdot ) = I_{ 0}(Q_{i}(\cdot )+K_{i}^{T}(\cdot )R_{ i}(\cdot )K_{i}(\cdot ))I_{0}^{T}+I_{ 0}Q_{i}(\cdot )I_{1}^{T}+I_{ 1}Q_{i}(\cdot )I_{0}^{T}+I_{ 1}Q_{i}(\cdot )I_{1}^{T}\,. }$$

(51)

Taking the gradient with respect to K _i of the expression within the bracket of (49) yield the necessary conditions for an extremum of risk-value performance index (41) on the time interval $[t_{0},\varepsilon ]$

$$\displaystyle{ K_{i} = -R_{i}^{-1}(\varepsilon )B_{ i}^{T}(\varepsilon )I_{ 0}^{T}\sum _{ r=1}^{{k}^{i} }\hat{\mu }_{r}^{i}\mathcal{Y}_{ r}^{i}\,I_{ 0}{({(I_{0}^{T}I_{ 0})}^{-1})}^{T}\,,\quad i \in \mathcal{I} }$$

(52)

where $\hat{\mu }_{r}^{i} \triangleq \mu _{r}^{i}/\mu _{1}^{i}$ for $\mu _{1}^{i} > 0$. With the feedback Nash strategy (52) replaced in the expression of the bracket (49) and having $\left \{\mathcal{Y}_{r}^{i}\right \}_{r=1}^{{k}^{i} }$ evaluated on the optimal solution trajectories (38)–(40), the time-dependent functions $\mathcal{E}_{r}^{i}(\varepsilon )$, $\breve{\mathcal{T}}_{r}^{i}(\varepsilon )$ and $\mathcal{T}_{r}^{i}(\varepsilon )$ are therefore chosen such that the sufficient condition (45) in the verification theorem is satisfied in the presence of the arbitrary value of $z_{0}^{i}$; for example,

$$\displaystyle\begin{array}{rcl}{ d \over d\varepsilon }\mathcal{E}_{1}^{i}(\varepsilon )& =& {(F_{ {\ast}}^{i})}^{T}(\varepsilon )\mathcal{H}_{ 1{\ast}}^{i}(\varepsilon ) + \mathcal{H}_{ 1{\ast}}^{i}(\varepsilon )F_{ {\ast}}^{i}(\varepsilon ) + N_{ {\ast}}^{i}(\varepsilon ) {}\\ {d \over d\varepsilon }\mathcal{E}_{r}^{i}(\varepsilon )& =& {(F_{ {\ast}}^{i})}^{T}(\varepsilon )\mathcal{H}_{ r{\ast}}^{i}(\varepsilon ) + \mathcal{H}_{ r{\ast}}^{i}(\varepsilon )F_{ {\ast}}^{i}(\varepsilon ) {}\\ & & \quad +\sum _{ s=1}^{r-1}{ 2r! \over s!(r - s)!}\mathcal{H}_{s{\ast}}^{i}(\varepsilon ){G}^{i}(\varepsilon ){W}^{i}{({G}^{i})}^{T}(\varepsilon )\mathcal{H}_{ r-s{\ast}}^{i}(\varepsilon ),\quad 2 \leq r \leq {k}^{i} {}\\ \end{array}$$

and

$$\displaystyle{{ d \over d\varepsilon }\breve{\mathcal{T}}_{r}^{i}(\varepsilon ) = \mathcal{H}_{ r{\ast}}^{i}(\varepsilon ){E}^{i}(\varepsilon )u_{ -i}^{{\ast}}(\varepsilon )\,,\qquad 1 \leq r \leq {k}^{i} }$$

and, finally

$$\displaystyle\begin{array}{lll}{ d \over d\varepsilon }\mathcal{T}_{1}^{i}(\varepsilon )& =\mathrm{ Tr}\left \{\mathcal{H}_{ 1{\ast}}^{i}(\varepsilon ){G}^{i}(\varepsilon ){W}^{i}{({G}^{i})}^{T}(\varepsilon )\right \} - {(u_{ -i}^{{\ast}})}^{T}(\varepsilon )M_{ i}(\varepsilon )u_{-i}^{{\ast}}(\varepsilon )& {}\\ {d \over d\varepsilon }\mathcal{T}_{r}^{i}(\varepsilon )& =\mathrm{ Tr}\left \{\mathcal{H}_{ r{\ast}}^{i}(\varepsilon ){G}^{i}(\varepsilon ){W}^{i}{({G}^{i})}^{T}(\varepsilon )\right \},\quad 2 \leq r \leq {k}^{i} & {}\\ \end{array}$$

with the initial-value conditions $\mathcal{E}_{r}^{i}(t_{0}) = 0$, $\breve{\mathcal{T}}_{r}^{i}(t_{0}) = 0$ and $\mathcal{T}_{r}^{i}(t_{0}) = 0$ for 1 ≤ r ≤ k ⁱ. Therefore, the sufficient condition (45) of the verification theorem is satisfied so that the extremizing feedback strategy (52) becomes optimal.

Therefore, the subsequent result for risk-bearing decisions is already proved and thus summarized for each incumbent agent i and $i \in \mathcal{I}$; who autonomously selects $K_{i}^{{\ast}}$ for its person-by-person equilibrium (or equivalently, feedback Nash decision policy) strategy in presence of its immediate neighbors’ feedback Nash policy parameters $K_{-i}^{{\ast}}$, as in Fig. 1.

Theorem 5 (Person-by-Person Equilibrium Policies for Distributed Control).

Consider the linear-quadratic class of distributed stochastic systems whose descriptions are governed by (6)–(13) and subject to the assumption of $(A_{i},B_{i})$ and $(A_{i},C_{i})$ for $i \in \mathcal{I}$ uniformly stablizable and detectable. Assume that incumbent systems or agents are constrained to admissible decision laws $u_{i}(\cdot ) = K_{i}(\cdot )\hat{z}_{i}(\cdot )$ , where the conditional mean estimates $\hat{z}_{i}(\cdot )$ are governed by the decentralized state-estimation dynamics (9). Further let incumbent agents i select ${k}^{i} \in \mathbb{N}$ and the sequence of nonnegative coefficients ${\mu }^{i} =\{\mu _{ r}^{i} \geq 0\}_{r=1}^{{k}^{i} }$ with $\mu _{1}^{i} > 0$ . Then, there exists a person-by-person equilibrium which strives to optimize the risk-value awareness performance indices (41); e.g.,

$$\displaystyle\begin{array}{rcl} u_{{\ast}}^{i}(t)& =& K_{ i}^{{\ast}}(t)\hat{z}_{ i}^{{\ast}}(t),\quad t \triangleq t_{ 0} + t_{f} -\tau {}\end{array}$$

(53)

$$\displaystyle\begin{array}{rcl} K_{i}^{{\ast}}(\tau )& =& -R_{ i}^{-1}(\tau )B_{ i}^{T}(\tau )I_{ 0}^{T}\sum _{ r=1}^{{k}^{i} }\hat{\mu }_{r}^{i}\mathcal{H}_{ r{\ast}}^{i}(\tau )\,I_{ 0}{({(I_{0}^{T}I_{ 0})}^{-1})}^{T}\,,\quad i \in \mathcal{I}{}\end{array}$$

(54)

wherein the parametric design freedom through $\hat{\mu }_{r}^{i}$ represent the preferences toward specific summary statistical measures; e.g., mean, variance, skewness, etc. are chosen by incumbent agent i for performance reliability; whereas the optimal solutions $\mathcal{H}_{r{\ast}}^{i}(\cdot )$ satisfy the backward-in-time matrix-valued differential equations

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }\mathcal{H}_{1{\ast}}^{i}(\tau )& =& -{(F_{ {\ast}}^{i})}^{T}(\tau )\mathcal{H}_{ 1{\ast}}^{i}(\tau ) -\mathcal{H}_{ 1{\ast}}^{i}(\tau )F_{ {\ast}}^{i}(\tau ) - N_{ {\ast}}^{i}(\tau )\,,\quad \mathcal{H}_{ 1{\ast}}^{i}(t_{ f}) = N_{f}^{i}{}\end{array}$$

(55)

$$\displaystyle\begin{array}{rcl} \frac{d} {d\tau }\mathcal{H}_{r{\ast}}^{i}(\tau )& =& -{(F_{ {\ast}}^{i})}^{T}(\tau )\mathcal{H}_{ r{\ast}}^{i}(\tau ) -\mathcal{H}_{ r{\ast}}^{i}(\tau )F_{ {\ast}}^{i}(\tau ) \\ & & \quad -\sum _{s=1}^{r-1}{ 2r! \over s!(r - s)!}\mathcal{H}_{s{\ast}}^{i}(\tau ){G}^{i}(\tau ){W}^{i}{({G}^{i})}^{T}(\tau )\mathcal{H}_{ r-s{\ast}}^{i}(\tau ),\,\mathcal{H}_{ r{\ast}}^{i}(t_{ f}) \\ & & = 0,\,2 \leq r \leq {k}^{i}\,. {}\end{array}$$

(56)

In addition, the decentralized state estimates $\hat{z}_{i}^{{\ast}}(t)$ associated with incumbent agent i and $i \in \mathcal{I}$ when the person-by-person equilibrium policy (53) are applied, are satisfying the forward-in-time vector-valued differential equation with $\hat{z}_{i}^{{\ast}}(t_{0}) = z_{i}^{0}$

$$\displaystyle{ d\hat{z}_{i}^{{\ast}}(t) = (A_{ i}(t)\hat{z}_{i}^{{\ast}}(t)+B_{ i}(t)u_{i}^{{\ast}}(t)+u_{ -i}^{{\ast}}(t))dt+L_{ i}(t)[dy_{i}^{{\ast}}(t)-C_{ i}(t)\hat{z}_{i}^{{\ast}}(t)dt] }$$

(57)

and

$$\displaystyle{ dz_{i}^{{\ast}}(t) = (A_{ i}(t)z_{i}^{{\ast}}(t) + B_{ i}(t)u_{i}^{{\ast}}(t) + u_{ -i}^{{\ast}}(t)dt)dt + G_{ i}(t)d\xi _{i}(t),\,\,z_{i}^{{\ast}}(t_{ 0}) = z_{i}^{0} }$$

(58)

$$\displaystyle{ u_{-i}^{{\ast}}(t)dt =\sum _{ j=1,j\neq i}^{N_{i} }B_{j}(t)u_{j}^{{\ast}}(t)dt + d\eta _{ i}(t) }$$

(59)

$$\displaystyle{ dy_{i}^{{\ast}}(t) = C_{ i}(t)z_{i}^{{\ast}}(t)dt + dv_{ i}(t) }$$

(60)

whereby the decentralized filter gain $L_{i}(t) = \Sigma _{i}(t)C_{i}^{T}(t)V _{i}^{-1}$ and the state-estimate error covariance $\Sigma _{i}(t)$ is determined by the forward-in-time matrix-valued differential equation with initial-value condition $\Sigma _{i}(t_{0}) = 0$

$$\displaystyle{ \frac{d} {dt}\Sigma _{i}(t) = A_{i}(t)\Sigma _{i}(t)+\Sigma _{i}(t)A_{i}^{T}(t)+G_{ i}(t)\Xi _{i}G_{i}^{T}(t)+I_{ i}-\Sigma _{i}(t)C_{i}^{T}(t)V _{ i}^{-1}C_{ i}(t)\Sigma _{i}(t)\,. }$$

Notice that to have the person-by-person equilibrium policy (53) of incumbent agent i be defined and continuous for all $\tau \in [t_{0},t_{f}]$, the solutions $\mathcal{H}_{r{\ast}}^{i}(\tau )$ to the equations (55)–(56) when evaluated at τ = t ₀ must also exist. Thus, it is necessary that $\mathcal{H}_{r{\ast}}^{i}(\tau )$ are finite for all $\tau \in [t_{0},t_{f})$. Moreover, the solutions of (55)–(56) exist and are continuously differentiable in a neighborhood of t _f. Applying the result from [13], these solutions can further be extended to the left of t _f as long as $\mathcal{H}_{r{\ast}}^{i}(\tau )$ remain finite. Hence, the existence of unique and continuously differentiable solutions to (55)–(56) is certain if $\mathcal{H}_{r{\ast}}^{i}(\tau )$ are bounded for all $\tau \in [t_{0},t_{f})$. Subsequently, the candidate value functions ${\mathcal{W}}^{i}(\tau,{\mathcal{H}}^{i},\breve{{\mathcal{D}}}^{i},{\mathcal{D}}^{i})$ are continuously differentiable.

5 Conclusions

The present research offers a theoretic lens and a novel approach that direct attention towards mathematical statistics of the chi-squared random performance measures concerned by incumbent agents of the class of distributed stochastic systems herein and thus provide new insights into complex dynamics of performance robustness and reliability. To account for mutual influence from immediate neighbors that give rise to interaction complexity such as potential noncooperation, each incumbent system or self-directed agent autonomously focuses on the search for a person-by-person equilibrium which is in turn locally supported by noisy state observations. In views of performance risks, a new paradigm shift for understanding and building decentralized person-by-person equilibrium policies for the emergence of flexibly autonomous systems is obtained, with which the self-directed agents of incumbent systems, who are constrained to decentralized information processing and distributed decision making, are fully capable of implementing risk-bearing actions and local best responses in the furtherance of their own goals.

References

Basar T and Olsder GJ (1999) Dynamic noncooperative game theory. 2nd Edn. Society for Industrial and Applied Mathematics
Google Scholar
Engwerda JC (2005) LQ dynamic optimization and differential games. Wiley
Google Scholar
Pollatsek A and Tversky A (1970) Theory of risk. Journal of Mathematical Psychology, 7:540–53
Article MATH Google Scholar
Luce RD (1980) Several possible measures of risk. Theory and Decision, 12:217–228
Article MathSciNet MATH Google Scholar
Pham KD (2008) New results in stochastic cooperative games: strategic coordination for multi-resolution performance robustness. In: Hirsch MJ, Pardalos PM, Murphey R, Grundel D (eds.) Optimization and Cooperative Control Strategies. Series Lecture Notes in Control and Information Sciences, 381:257–285. Springer Berlin: Heidelberg
Google Scholar
Pham KD (2010) Performance-information analysis and distributed feedback stabilization in large-scale interconnected systems. Dynamics of Information Systems Theory and Applications Series: Springer Optimization and Its Applications, Hirsch, MJ; Pardalos, PM; Murphey R (Eds.), 40:45–81
Google Scholar
Pham KD (2008) Cooperative outcomes for stochastic Nash games: decision strategies towards multi-attribute performance robustness. The 17th International Federation of Automatic Control World Congress, pp. 11750–11756
Google Scholar
Marschak J and Radner R (1972) Economic theory of teams. New Haven: Yale University Press
MATH Google Scholar
Jacobson DH (1973) Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic games. IEEE Transactions on Automatic Control, 18:124–131
Article MATH Google Scholar
Whittle P (1990) Risk sensitive optimal control. New York: John Wiley & Sons
MATH Google Scholar
Fleming WH and Rishel RW (1975) Deterministic and stochastic optimal control. Springer-Verlag
Google Scholar
Pham KD (2011) Performance-reliability-aided decision-making in multiperson quadratic decision games against jamming and estimation confrontations. Journal of Optimization Theory and Applications, Edited by Giannessi F, 149(1):599–629
Article MathSciNet MATH Google Scholar
Dieudonne J (1960) Foundations of modern analysis. New York and London: Academic Press
MATH Google Scholar

Download references

Acknowledgments

The first author would like to acknowledge the Air Force Research Laboratory for financially supporting this research through the Collaborative Systems Control Strategic Technology Thrust Initiatives. The opinions expressed in this research article are those of the authors and do not necessarily represent, and should not be attributed to, the United States Air Force, the Department of Defense, or the United States Government.

Author information

Authors and Affiliations

Air Force Research Laboratory, Space Vehicles Directorate, Kirtland Air Force Base, NM, 87117, USA
Khanh D. Pham
Department of Electrical and Computer Engineering, Air Force Institute of Technology, Wright-Patterson Air Force Base, OH, 45433, USA
Meir Pachter

Authors

Khanh D. Pham
View author publications
You can also search for this author in PubMed Google Scholar
Meir Pachter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khanh D. Pham .

Editor information

Editors and Affiliations

Innovative Scheduling, Inc., Gainesville, Florida, USA
Alexey Sorokin
Department of Industrial and Systems Eng, University of Florida, Gainesville, Florida, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, K.D., Pachter, M. (2013). A Risk-Averse Game-Theoretic Approach to Distributed Control. In: Sorokin, A., Pardalos, P. (eds) Dynamics of Information Systems: Algorithmic Approaches. Springer Proceedings in Mathematics & Statistics, vol 51. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7582-8_4

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7582-8_4
Published: 16 July 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7581-1
Online ISBN: 978-1-4614-7582-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Risk-Averse Game-Theoretic Approach to Distributed Control

Abstract

Similar content being viewed by others

A Framework for Coordination in Distributed Stochastic Systems: Perfect State Feedback and Performance Risk Aversion

Distributed Supervisory Strategies for Multi-agent Networked Systems

Robust Optimal Control of Dynamically Decoupled Systems via Distributed Feedbacks

Keywords

1 Introduction

2 Mathematical Statistics for Performance Robustness

Theorem 1 (Cumulant-Generating Function).

Proof.

Theorem 2 (Performance-Measure Statistics).

Proof.

3 Problem Statements

Definition 1 (Admissible Feedback Policy Gains).

Definition 2 (Risk-Value Aware Performance Index).

Definition 3 (Nash Equilibrium Solution).

Definition 4 (Feedback Nash Equilibrium).

Definition 5 (Target Sets).

Definition 6 (Optimization of Mayer Problem).

Definition 7 (Value Function).

Definition 8 (Reachable Sets).

Theorem 3 (Hamilton–Jacobi–Bellman (HJB) Equation-Mayer Problem).

Proof.

Theorem 4 (Verification Theorem).

Proof.

4 Person-by-Person Equilibrium Strategies

Theorem 5 (Person-by-Person Equilibrium Policies for Distributed Control).

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation