1 Introduction

The research on multi-agent dynamic games has a long history in the control community. A good survey of non-cooperative dynamic games can be found in Basar and Olsder [4]. In recent years, the dynamic game theory gets new inspiration and renews its vitality in network control and multi-agent systems. In the framework of dynamic games, a lot of researchers considered flow control, routing control, and multi-agent cooperation problems [13, 5, 14, 15, 17]. For distributed multi-agent systems, generally speaking, there is no centralized control station and each agent has only limited sensing and communication ability, so control design is always required to be decentralized. In a decentralized control framework, the control input of each agent can only use the local state, or under certain circumstances, include those of others in its sensing/communication neighborhood.

Recently, Huang, Caines, and Malhamé did pioneering work on decentralized stochastic games for a kind of individual-population interacting multi-agent systems with mean-field coupling [16, 1822], which have wide application background in biological, engineering, and economic systems [6, 14, 29, 34]. In these kinds of systems, the number of agents is quite large. Each agent is driven by stochastic noises and interacts with all other agents via the population state average (PSA). The interactions between individual states and the PSA exist in both the dynamic equation and the cost function of every agent. For a given agent, the impact of any other single agent is so small that can be neglected, however, that of the overall population is significant enough for its evolution. Though the agents are coupled with the PSA, the PSA cannot be used for the individual control design, since it is unknown for any given agent. This is an essential difficulty of the decentralized control design for decentralized mean field games. To overcome this difficulty, Huang, Caines, and Malhamé proposed the methodology called the Nash Certainty Equivalence (NCE) principle. In the NCE principle, the PSA is properly approximated by its mean field approximation, a deterministic signal, which is then used for the individual control design instead of the PSA. This principle is similar in spirit to the well-known Certainty Equivalence (CE) principle adopted in adaptive control, where the unknown parameters are estimated, and the estimates are used as the true parameters to construct the control laws. The LQG mean field games with scalar agent models and deterministic discounted cost functions are studied in Huang, Malhamé, and Caines [16, 18, 20, 22]. Li and Zhang [26, 27] introduced the concepts of asymptotic Nash equilibria in the probability sense and extended to the cases with state space or ARX dynamic models and stochastic ergodic cost functions. The mean field method is also developed independently in Lasry and Lions [25] and in Weintraub and Benkard [3133] by using the concept of oblivious equilibrium. The mean field control for Markov decision problems is considered in Tembine et al. [30]. Now, decentralized mean field games have been extended to non-linear dynamic models and the case with heterogeneous agents [13, 21, 23].

For decentralized mean field games, most of the relevant literature assumes precise dynamic models of agents. However, in real systems, there may be parametric uncertainties or unmodelled dynamics in agents’ models due to various kinds of unknown or uncertain factors in the environment. Generally speaking, the parametric uncertainties can be divided into two categories: unknown local parameters, which contain the information of local environment, and unknown global parameters, which are shared by all agents. In this paper, we assume the local dynamics of each agent is precisely known, but the common coupling strength between the individual state and the PSA, which is a global parameter, is unknown.

To eliminate the model uncertainties, each agent can exploit its learning ability to perfect its dynamic model by measured data step by step. By using the individual online learner or identifier, each agent uses its estimate for the coupling strength to construct its individual control law, which aims at optimizing its cost function. Therefore, the overall system emerges as a large population decentralized adaptive game. In these kinds of adaptive games, there are two estimation processes. One is the estimation for the PSA and the other is the identification for the unknown coupling strength. A key difficulty lies in that there is a product term of the unknown PSA and the unknown coupling strength in each agent’s dynamic equation. So, if traditional identification algorithms were used, then the regression vector would contain the PSA as a component in each agent’s identification algorithm. However, we know that the PSA is unavailable for each individual. Intuitively, the estimation signal for the PSA can be used to construct the identification algorithms instead of the PSA. Unfortunately, this may result in the coupling between the estimation process for the PSA and that for the unknown coupling strength. Decentralized adaptive games for individual-population interacting systems are considered firstly in Huang, Malhamé, and Caines [19] and Kizilkale and Caines [24]. In Kizilkale and Caines [24], the dynamic equations of agents are uncoupled and the local dynamic parameters are unknown, while in Huang, Malhamé, and Caines [19], the dynamics of agents are coupled, but the precise value of the PSA is used in the identification algorithm. In brief, the coupling between the two estimation processes, which is a key difficulty in decentralized mean field adaptive games does not exist in Huang, Malhamé, and Caines [19] and Kizilkale and Caines [24]. To our best knowledge, up to now there is no relevant literature, which involves the case where both the PSA and the coupling strength are unknown.

For decentralized adaptive mean field games, there are some fundamental problems that have to be studied.

  1. (1)

    Whether the closed-loop system is stable, that is, the states of all agents are kept bounded as time goes on. And if the answer is affirmative, whether the stability can be retained when the number N of agents increases to infinity.

  2. (2)

    Whether the estimate of the PSA is strongly consistent or whether the estimation error for the PSA converges to zero with respect to some metric almost surely as N tends to infinity? If the answer is affirmative, what is the convergence rate?

  3. (3)

    Whether the identification algorithm for the coupling strength is strongly consistent or the estimation errors are bounded? If the estimation errors are bounded, can we ensure that the bound converges to zero as N tends to infinity and get the convergence rate?

  4. (4)

    Whether the decentralized control law designed is asymptotically optimal almost surely, or is an almost sure asymptotic Nash-equilibrium? If the answer is affirmative, what is the convergence rate of the sub-optimal cost function of each agent to the optimal value, as N tends to infinity?

From above, it can be seen that the large population decentralized adaptive mean field game is essentially different from traditional adaptive control for single agent systems [7, 9, 11], and the solutions to the convergence problems (1)–(4) cannot be found in the existing theoretical framework. It is worth pointing out that this kind of adaptive control is also essentially different from decentralized adaptive games for large-scale systems [28, 35, 36]. In decentralized adaptive games for large-scale systems, there is no need for the regression vector of each agent’s identification algorithm to contain any unknown states, and thus, there is no coupling between the identification for unknown parameters and the estimation for unknown states. This large population decentralized adaptive mean field game is closely related to robust adaptive control. The identification algorithm for the coupling strength, which involves the estimate for unknown states, can be viewed as the identification algorithm for a dynamic system with unmodelled dynamics. However, the assumptions on the unmodelled dynamics in robust adaptive control [8, 10] become closed-loop assumptions and are hard to be verified in the decentralized control framework.

In this paper, we consider the decentralized adaptive mean field game for individual-population interacting stochastic multi-agent systems. The dynamic equation of each agent is described by a discrete-time ARX model, and coupled by terms of the PSA with unknown coupling strength. Each agent has a group tracking type cost function, also coupled by the PSA. Firstly, based on the NCE principle, the PSA is estimated by some deterministic signal. Secondly, the estimation of the PSA is used to construct the decentralized Least square (LS) identification algorithm for the coupling strength. Finally, the estimates of the PSA and the coupling strength are both used to construct the decentralized control law based on the NCE and the CE principles. By the stochastic Lyapunov method, we analyze the decentralized LS algorithm and then by the probability limit theory, under mild conditions, we get the following convergence results of the closed-loop system:

  1. (i)

    The closed-loop system is stable almost surely and the states of agents will retain bounded as N tends to infinity.

  2. (ii)

    As N tends to infinity, the estimation error for the PSA converges to zero with the rate O(1/N) almost surely.

  3. (iii)

    As N tends to infinity, the identification error for the unknown coupling strength converges to zero with the rate \(O(1/\sqrt{N})\).

  4. (iv)

    The decentralized control law designed is an almost sure asymptotic Nash equilibrium, and the cost function of each agent is almost surely asymptotically optimal with the convergence rate O(1/N) given that all other agents also employ the strategy specified by the asymptotic Nash equilibrium.

The remainder of this paper is organized as follows. In Sect. 2, the system model and the decentralized game problem are formulated. In Sect. 3, a detailed design procedure of the two-level decentralized adaptive control law is presented, which is based on the NCE and the CE principles. In Sect. 4, the main results of this paper, regarding the stability of the closed-loop system, the asymptotic optimality of the control law, the asymptotic consistency of the estimations for the PSA, and the coupling strength are presented. In Sect. 5, a numerical example is used to demonstrate our theoretic results. In Sect. 6, some concluding remarks are given.

The following notation will be used throughout this paper. \(\mathbb{R}\) denotes the set of all real numbers. For a given random variable (r.v.) ξ on a probability space \((\varOmega, \mathcal{F}, P)\), E(ξ) denotes the mathematical expectation of ξ. For a family {ξ λ ,λΛ} of real-valued r.v.s, σ(ξ λ ,λΛ) denotes the σ-algebra \(\sigma(\{\xi_{\lambda}\in B\}, B\in\mathcal{B}, \lambda\in\varLambda)\), where \(\mathcal{B}\) denotes the one-dimensional Borel sets. For a sequence \(\{\mathcal{F}_{t}, t\geq0\}\) of non-decreasing σ-algebras and a sequence {ξ(t),t≥0} of r.v.s, we say ξ(t) is adapted to \(\mathcal{F}_{t}\) or \(\{\xi(t),\mathcal{F}_{t}\}\) is an adapted sequence, if for any t≥0, ξ(t) is \(\mathcal{F}_{t}\) measurable.

2 Problem Formulation

We consider a system of N agents denoted by S N. The dynamic equation of agent i is given by

(1)

where \(x_{i}^{N}\in\mathbb{R}\), \(u_{i}^{N}\in\mathbb{R}\) are the state and control input, respectively; \(\overline{x}_{N}(t)\stackrel {\triangle}{=} \frac{1}{N}\sum_{j=1}^{N}x_{j}^{N}(t)\) is the PSA; \(\omega_{i}(t)\in\mathbb{R}\) is the random noise; g i (⋅ ,⋅): \(\mathbb{R}\times\mathbb{R}\to\mathbb{R}\) is a known Borel measurable function; \(\alpha\in\mathbb{R}\) is the unknown coupling parameter satisfying |α|<1. Note that model (1) is just the scalar version of the dynamic model considered in [26], and what is different here the coupling strength α is unknown.

For model (1), we have the following assumptions:

(A1) \(\{\{\omega_{i}(t),\mathcal{F}^{i}_{t}\}, 1\leq i\leq N, N\geq 1\}\) is a family of independent martingale difference sequences defined on a probability space \((\varOmega,\mathcal{F},P)\) with the following properties: there exist constants σ>0 and β>2, such that

where \(\mathcal{F}_{t}^{i}\stackrel{\triangle}{=}\sigma(\omega _{i}(s),0\leq s\leq t)\).

(A2) \(\{x_{i}^{N}(0), 1\leq i\leq N, N\geq1\}\) is independent of \(\{\{\omega_{i}(t),\mathcal{F}^{i}_{t}\},i\geq1\}\), with a common mathematical expectation \(x_{0}\stackrel{\triangle }{=}Ex_{1}^{N}(0)<\infty\).

The cost function of agent i is given by

$$ J_{i}^{N}\bigl(u_i^N,u_{-i}^N \bigr)=\limsup_{T\to\infty}\frac{1}{T} \sum_{t=0}^T \bigl[x_i^N(t+1)-\varPhi\bigl(t,\overline{x}_N(t) \bigr)\bigr]^{2}, $$
(2)

where \(u_{-i}^{N}=(u_{1}^{N},\ldots, u_{i-1}^{N}, u_{i+1}^{N}, \ldots, u_{N}^{N})\), \(\varPhi(t,x):[0,\infty)\times\mathbb{R}\rightarrow \mathbb {R}\) is a Borel measurable function.

With regards to the cost function, the following assumptions will be involved in the closed-loop analysis:

(A3) The solution of the non-linear iteration x(t+1)=Φ(t,x(t)) with x(0)=x 0 satisfies

(A4) The solution of the non-linear iteration x(t+1)=Φ(t,x(t)) with x(0)=x 0 satisfies

It can be easily verified that if Φ(t,x)=x, t∈[0,∞), \(x\in \mathbb{R}\), then both (A3) and (A4) hold.

For convenience of citation, for agent i, we denote the global-measurement-based admissible control set by

$$\mathcal{U}^N_{g,i}\stackrel{\triangle}{=}\Biggl \{u_{i}^{N}\ |\ u_{i}^{N}(t)\ \hbox{is adapted to } \sigma\Biggl(\bigcup_{j=1}^N \sigma \bigl(x_j^N(s), 0\leq s\leq t\bigr)\Biggr)\Biggr\}, $$

the local-measurement-based admissible control set by

$$\mathcal{U}^N_{l,i}\stackrel{\triangle}{=}\bigl \{u_{i}^{N}\ |\ u_{i}^{N}(t)\ \hbox{is adapted to} \ \sigma\bigl(x_i^N(s), 0\leq s\leq t\bigr) \bigr\}, $$

and admissible control set by \(\mathcal{U}^{N}_{i}\). The so-called decentralized game means that agent i synthesizes \(u_{i}^{N}\) only based on the local measurement (i.e. \(\mathcal{U} _{i}^{N}=\mathcal{U}_{l,i}^{N}\)) to minimize its cost function \(J_{i}^{N}(u_{i}^{N}, u_{-i}^{N})\). We denote a control group of the sequence S N of systems by \(\mathbf{U}^{N}=\{u_{i}^{N},1\leq i\leq N\}\), and its associated cost function group by \(\mathbf{J}^{N}=\{J_{i}^{N}(u_{i}^{N}, u_{-i}^{N}),\ 1\leq i \leq N\}\). To characterize the asymptotic optimality of the decentralized control law with respect to the stochastic cost functions, we introduce the concept of almost sure asymptotic Nash-equilibrium given in Li and Zhang [26].

Definition 1

For system (1), a sequence of control groups \(\{\mathbf{U}^{N}=\{u_{i}^{N},1\leq i\leq N\},N\geq1\}\) is called an almost sure asymptotic Nash-equilibrium with respect to the associated sequence of cost function groups \(\{\mathbf{J}^{N}=\{ J_{i}^{N},1\leq i\leq N\},N\geq1\}\), if there exists a sequence of non-negative r.v.s {ϵ N (ω),N≥1} on the probability space \((\varOmega, {\mathcal{F}},P)\), such that ϵ N →0 a.s., as N→∞, and for sufficiently large N,

(3)

By Theorem 2.1 of [26], we know that \(\inf_{v_{i}\in\mathcal{U}_{g, i}^{N}}J_{i}^{N}(v_{i}, u_{-i}^{N})=\sigma^{2}\). In the following, we will design decentralized control law {U N,N≥1}, such that the closed-loop system satisfies

that is, the sequence of control groups {U N,N≥1} is an almost sure asymptotic Nash-equilibrium.

3 Control Design

Firstly, we make a review of the results with known coupling strength.

3.1 The Case with Known Coupling Strength

For the centralized control law design, the control of agent i depends on the PSA \(\overline{x}_{N}\), while for the design of the decentralized control law, the PSA is unknown. If the coupling strength α is known, we may use the NCE principle to design the decentralized control law. Firstly, we construct an estimate f(t) of the PSA with the following property: if every agent takes f(t) as the estimate of the PSA, and according to f(t), makes the optimal decision, then the expectation of the closed-loop PSA is just f(t) or convergent to it when N increases to infinity. Secondly, if the f(t) with the above property indeed exists, then we can construct the decentralized control law by using f(t) instead of \(\overline{x}_{N}(t)\).

Based on the NCE principle, we now design the decentralized control law.

The auxiliary equation of agent i is given by

(4)

with a tracking-type cost function:

$$J_i^N\bigl(\widehat{u}_i^N \bigr)=\limsup_{T\to\infty}\frac{1}{T}\sum_{t=0}^T \bigl[\widehat{x}_i^N(t+1)-\varPhi\bigl(t,f(t)\bigr) \bigr]^2. $$

In this case, the optimal control is obviously

$$ \widehat{u}_i^N(t)=\varPhi \bigl(t,f(t)\bigr)-g_i\bigl(\widehat{x}_i^N(t),t \bigr)-\alpha f(t). $$
(5)

Substituting control (5) into the model (4), we have

$$ E\widehat{x}_i^N(t+1)=\varPhi\bigl(t, f(t)\bigr),\qquad E\widehat{x}_i^N(0)=x_{0}. $$
(6)

As mentioned above, the mathematical expectation of the closed-loop PSA ought to be f(t), that is,

$$ \frac{1}{N}\sum_{j=1}^{N}E \widehat{x}_j^N(t)=f(t), \quad t\geq0. $$
(7)

Therefore, the unique solution of the auxiliary system (6) and (7) can be used as the estimate of the PSA. We denote it by f (t), which is iteratively given by

$$ f^*(t+1)=\varPhi\bigl(t, f^*(t)\bigr),\quad t\geq0, \qquad f^*(0)=x_0. $$
(8)

By (5) and the NCE principle, the control law for agent i can be taken as

$$ u_i^{0}(t)=\varPhi\bigl(t, f^{*}(t)\bigr)-g_i\bigl(x_i^N(t),t \bigr)-\alpha f^{*}(t). $$
(9)

Here and hereafter, the superscript N of \(u_{i}^{0^{N}}(t)\) is omitted for conciseness of expression. Comparing (9) with the centralized control law, it can be seen that \(\overline{x}_{N}\) in (9) is replaced by f for control design.

As shown in [26], we can prove the asymptotic consistency of the estimate f for the PSA, the stability and asymptotic optimality of the closed-loop system under the control law (9). We have the following theorems [26]).

Proposition 1

For system (1), if Assumptions (A1)–(A2) hold, then under the control law (9), the closed-loop system has the following properties:

(10)

where

(11)

is the estimation error for the PSA.

Proposition 2

For system (1), if Assumptions (A1)–(A3) hold, then under the control law (9), the closed-loop system satisfies

Proposition 3

For system (1) with cost function (2), if Assumptions (A1)–(A2) hold and there exists a constant γ>0 such that |Φ(t,x)−Φ(t,y)|≤γ|xy|, ∀x, \(y\in\mathbb{R}\), t≥0, then under the control law (9), the associated cost function group satisfies

(12)

where

3.2 The Case with Unknown Coupling Strength

If the coupling strength α is unknown, then the control law (9) is unavailable. Naturally, one might think that based on the model (1), agent i could use the recursive LS algorithm to estimate α:

(13)
(14)

then by CE principle, instead of α, the estimation \(\overline {\alpha}_{i}(t)\) could be used to construct the control law

(15)

However, the control law (13)–(15) is not decentralized due to the use of the PSA in the identification algorithm. Since the PSA \(\overline{x}_{N}(t)\) is unknown for agent i, we use f (t), which is the estimation of \(\overline{x}_{N}(t)\) based on the NCE principle, to construct the identification algorithm of agent i:

(16)
(17)

where α 0 and P(0)=P 0 are initial conditions to be designed, and f (t) is computed off-line by (8). The identification algorithm (16) and (17) is decentralized, since it only uses the local state and input of each agent. Then by CE principle, we use the estimate α i (t) to construct the control law:

(18)

It can be seen that the control law (16)–(18) is decentralized and it is designed based on both the NCE and the CE principles.

Remark 1

Here, a decentralized two-level control scheme is used for adaptive mean field adaptive games. On the high level, the PSA is estimated based on the NCE principle. On the low level, the coupling strength is identified based on the decentralized LS algorithms and the estimate of the PSA. The decentralized control law is constructed by combining the NCE and the CE principles.

4 Closed-Loop Analysis

In this section, we analyze the identification algorithm, stability and optimality of the closed-loop system and the consistency of the estimates for the PSA and the coupling strength.

4.1 Identification Algorithm Analysis

From the model (1) and (16), one gets

(19)

where \(\widetilde{\alpha}_{i}(t)\stackrel{\triangle}{=}\alpha-\alpha _{i}(t)\) is the estimation error for the coupling strength α, and ξ N (t) is the estimation error for the PSA given by (11).

Denote \(V_{i}(t+1)=\widetilde{\alpha}_{i}^{2}(t+1)P^{-1}(t+1)\), \(r(T)=e+\sum_{t=0}^{T}f^{*^{2}}(t)\). From (17), we know that

(20)

Then summing the above equation from both sides, we have

(21)

From (19) and (20), we have

(22)

For the identification algorithm (16) and (17), we have the following results, which are important for the closed-loop analysis of the decentralized control law.

Theorem 1

If Assumption (A1) holds and

(23)

then the identification algorithm (16) and (17) have the following properties:

(i)

(24)

(ii)

(25)

(iii)

(26)

Proof

From (23), we have

(27)

Then summating both sides of (22) from t=t 1 to t=T, we get

(28)

From the above equation, (23) and Lemma A.1, we have

(29)

From (21), we have

(30)

and then

(31)

Denote \(\mathcal{F}_{t}=\sigma(\bigcup_{j=1}^{N}\mathcal{F}_{t}^{j})\). For any given ν∈(2,min{β,4}], by Cr inequality, we have

which together with Assumption (A1) and Lyapunov inequality leads to

(32)

Then by Lemma A.1, (30) and (31), for any given ϵ>0, noting that \(0\leq\frac{f^{*^{2}}(t)}{P_{0}^{-1}+\sum_{k=0}^{t}f^{*^{2}}(k)}\leq1\), we have

(33)

which together with (29) leads to (i) and (iii). Combining (i) and (iii), we get (ii). □

Remark 2

By (11), the model (1) can be rewritten as

So, (16), (17), and (18) can be viewed as the identification algorithm and adaptive control law for model

with αξ N (t) as the unmodelled dynamics. It can be seen that αξ N (t) contains the states of all other agents, due to decentralized information pattern, the conditions on unmodelled dynamics used in robust adaptive control [8, 10] cannot be used here.

4.2 Stability, Optimality, and Consistency

Substitute the control (18) to the model (1), we get the closed-loop equation of agent i

(34)

Summate the above equation for i=1,2,…,N, then by (8) we know that ξ N (t) satisfies the following recursive equation:

(35)

From (19) and (35), it can be seen that the dynamic equation (35) of the estimation error and the dynamic equation (19) of the identification error are coupled together. Below is the main result of this paper.

Theorem 2

If Assumptions (A1)–(A4) hold, then for the system (1), under the control (8), (16), (17), and (18), we have

(i) The estimate for PSA is asymptotically consistent:

(36)

where \(\|\xi_{N}\|_{T}=\sqrt{\frac{1}{T}\sum_{t=0}^{T}\xi_{N}^{2}(t)}\).

(ii) The closed-loop system is almost surely uniformly stable:

(37)

(iii) Furthermore, if there exists γ>0, such that for any x, y \(\in \mathbb{R}\) and t≥0, we have |Φ(t,x)−Φ(t,y)|≤γ|xy|, then {U N={u i (t), 1≤iN}, N≥1} is an almost sure asymptotic Nash equilibrium with respect to the associated sequence of cost function groups, and the cost function of each agent is almost surely asymptotically optimal with the convergence rate O(N −1) given that all other agents also employ the strategy specified by the asymptotic Nash equilibrium:

(38)

Proof

Take positive real numbers \(\epsilon\in(0, \frac{1-\alpha ^{2}}{2\alpha ^{2}})\) and δ∈(0,1−α 2(2ϵ+1)). From Assumption (A4), we know that

(39)

and similar to (27), we have

(40)

From (8) and Assumption (A3), we know that r(T)=O(T), n→∞. Then by (39), (40), and (iii) of Theorem 1, we have

(41)

which together with (35) leads to

(42)

From the above and Assumption (A1), we get

(43)

where \(\mu(\alpha,\epsilon,\delta)\stackrel{\triangle}{=}\frac {\alpha ^{2}(2\epsilon+1)}{1-\delta}\). This together with (41) leads to

(44)

Furthermore, by (34), Assumption (A3) and Lemma A.1, we have (ii).

From (34), (2), and Assumption (A1), it follows that

(45)

where

and

which together with (45), (43), (44), and Lemma A.1 leads to

(46)

Letting ϵ and δ go to zero in (46) and (43), we get (iii), and

(47)

which gives (i). □

Remark 3

Compared (36) and (38) with (10) and (12), it is shown that for the case with unknown coupling strength, under the adaptive control law designed, the convergence rates of the estimation error for PSA and the cost function of each agent to the best response value are the same as those for the case with known coupling strength.

Remark 4

From Theorem 2, we can see that to ensure the control law to be an asymptotic Nash equilibrium, the consistency of the identification for the coupling strength α is not necessary. This is similar to the case of LS based adaptive tracker (Guo and Chen [12]).

In the following theorem, under certain excitation condition on the non-linear iteration, we get the asymptotic consistency of the identification algorithm, that is, the upper limit of the identification error vanishes as the number N of agents increases to infinity. We need the following assumption.

(A5) The solution of the non-linear iteration x(t+1)=Φ(t,x(t)) with x(0)=x 0 satisfies

Theorem 3

If Assumptions (A1)–(A5) hold, then for the system (1), under the control (8), (16), (17), and (18), the closed-loop system satisfies

where \(\underline{f}=\liminf_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T}(f^{*}(t))^{2}>0\).

Proof

By (ii) of Theorem 1 and Assumption (A4), we have

(48)

From Assumption (A5), we know that there exists c 0>0, such that r(T)≥c 0 T for sufficiently large T, which together with (48), (i) of Theorem 2, and Assumption (A5) leads to

Letting ϵ go to zero, we get the conclusion of this theorem. □

5 Numerical Example

In this section, a numerical example is given to demonstrate stability of the closed-loop system and that the decentralized control law is designed to be an asymptotic Nash equilibrium.

The dynamic equation for the ith agent is given by

$$x_i^N(t+1)=x_i^N(t)+u_i^N(t)+0.5 \overline{x}_N(t)+\omega_i(t+1), $$

where the initial value \(x_{i}^{N}(0)\) has the normal distribution N(5,1), {ω i (t),t≥0} is a sequence of Gaussian white noise with distribution N(0,2). So α=0.5, σ=2. The non-linear coupling function in the cost function is Φ(t,x)=5sin(x)+6. The initial parameter estimate is taken as a 0=2 and P 0=10.

From (18), the decentralized control law is taken as

$$ u_i^{N}(t)=5\sin \bigl(f^{*}(t)\bigr)+6-x_i^N(t)-0.5f^{*}(t), \quad i=1,2,\ldots,N, $$
(49)

where f (t) is iteratively given by

$$f^*(t+1)=5\sin\bigl(f^*(t)\bigr)+6,\quad t\geq0, \ f^*(0)=5, $$

with (16) and (17) as the decentralized LS identifiers. Let N=10. The evolution of the estimation error of PSA, the states of agents are shown in Fig. 1. It can be seen that \(\|\xi_{10}\|_{t}^{2}\) converges to \(\frac{\sigma ^{2}}{N(1-\alpha^{2})}=8/15\), as t→∞ and the closed-loop system is stable. We let N vary from 1 to 2000. The evolution of the maximum values of the cost functions with respect to N is shown in Fig. 2. It can be seen that the maximum values of the cost functions converges to the optimal value σ 2=25, as N→∞.

Fig. 1
figure 1

Trajectories of \(\|\xi_{10}\|_{t}^{2}\) and x i (t), i=1,2,…,10 with respect to t

Fig. 2
figure 2

Trajectory of \(\max_{1\leq i\leq N}J_{i}^{N}\) with respect to N

6 Conclusions

A decentralized adaptive tracking-type game has been considered in this paper for individual-population interacting systems, in which each agent interacts with the overall population via the PSA in the individual dynamics and cost function. The coupling strength with the PSA is unknown. A two-level adaptive control law is designed based on the NCE and the CE principles. Firstly, the PSA is estimated based on the NCE principle, then the estimation of the PSA is used to construct the decentralized LS identification algorithm for the coupling strength; finally, the estimates for the coupling strength and the PSA are used to construct the decentralized control law based on the NCE and the CE principles. It is shown that under mild conditions, in probability one, the closed-loop system is stable, the decentralized control law is an asymptotic Nash equilibrium, and the estimates are asymptotically consistent as the number of agents goes to infinity.

Here, as a preliminary research for this direction, we give a framework for this problem and consider the case with scalar dynamic models of agents. For the LS estimation, we know that the estimation error \(\|\tilde{\theta}(t)\|^{2}=O (\frac{\ln (1+\sum_{i=0}^{t}\|\phi_{i}\|^{2} )}{\lambda_{min} (\sum_{i=0}^{t+1}\phi_{i}\phi_{i}^{T} )} )\). If the regression vector ϕ i is a scalar variable, then the LS algorithm is easily consistent, however, for the case with high dimensional regression vectors, the consistency requires additional excitation conditions [7]. So for adaptive control systems, there are essential differences between the scalar system models and high dimensional models For future research, the extension to general linear models may be interesting. Another important issue is to consider the case with both unknown local and global parameters, which may be more widely applicable, but much more difficult.