1 Introduction

Game theory provides a theoretical framework for conceiving social situations among competing players and using mathematical models of strategic interaction among rational decision-makers [1, 2]. Game-theoretic approaches to designing, modeling, and optimizing emerging engineering systems, biological behaviors and mathematical finance make this topic of research an extremely important tool in many fields with a wide range of applications [3,4,5]. Indeed, we can find numerous results for both theory and practice in the corresponding literature of differential games [6,7,8,9,10,11,12].

In this context, we can view games roughly in two categories: cooperative and noncooperative games [13]. A game is cooperative if the players are able to form binding commitments externally enforced (e.g., through contract law), resulting in collective payoffs. A game is noncooperative if players cannot form alliances or if all agreements need to be self-enforcing (e.g., through credible threats), focusing on predicting individual players’ actions and payoffs and analyzing Nash equilibria [14]. Nash equilibrium is an outcome which, once achieved or reached, means no player can increase its payoff by changing decisions unilaterally [13].

The development of algorithms to achieve convergence to a Nash equilibrium has been a focus of researchers for several decades [15, 16]. Some papers have also looked at learning aspects of various update schemes for reaching Nash equilibrium [17]. Related to extremum seeking (ES), the authors in [18] study the problem of computing, in real time, the Nash equilibria of static noncooperative games with N players by employing a non-model-based approach. By utilizing extremum seeking (ES) [19] with sinusoidal perturbations, the players achieve stable, local attainment of their Nash strategies without the need for any model information.

On the other hand, time delays are some of the most common phenomena that arise in engineering practice and industry, involving networking problems in areas such as network virtualization, software defined networks, cloud computing, the Internet of Things, context-aware networks, green communications, and security [3, 4, 20]. Hence, the motivation for employing ES to optimize such engineering processes commonly modeled by such a game-theoretic framework is very clear and highly demanding. We can even find publications on differential games with delays [21,22,23,24,25,26,27], but the literature has not addressed in this context extremum seeking feedback.

However, it seems not so simple to address this problem. Notice that despite the large number of publications on delay compensation via predictor feedback, there was no work in the literature which rigorously concerns ES in the presence of time delays. The reason was that delay compensation (such as predictor feedback [28]) is inherently model-based, whereas ES is inherently non-model based. In addition, this is an extremely important problem, because ES is all about convergence, with a good convergence rate, whereas a delay, when it is simply ignored, severely restricts the convergence rate or destabilizes the closed-loop system. Fortunately, we gave a positive answer to this question in our earlier publications [29, 30] by introducing solutions to the problem of designing multi-variable ES algorithms for delayed systems via predictors based on perturbation-based (averaging-based) estimates of the model. Since we have reached the point where we have quite a deep understanding of the options for multi-input ES with delays, it seems natural to move on to the direction of developing solid results for games under time delays through the multi-variable ES perspective. This is particularly challenging since in the literature for multi-input delay compensation, most authors consider a centralized approach for the predictor design (all vector multiplying the control inputs needs to be known), while games are decentralized.

In this sense, this paper pursues two sets of responses for games with full sharing of information, as well as more challenging case of games where there are restrictions on sharing of information by the players. For the former, the predictor-based delay compensator for the average system needs to be multi-variable, which means that each player would need to know at least about the existence of other players, including how many of them there are, and possibly also know something about their payoff functions. The key is the square matrix of the second derivatives (Hessian) in [18] due to the players’ payoffs, which must be estimated using the perturbation signals of each player. Such a perturbation-based estimate of the matrix needs to be shared by all the players. In this case, the game exhibits some kind of “cooperation” among the players for a collective delay compensation. Basically, the players are forced to cooperate minimally so that the Nash equilibrium can be achieved in a scenario with delays. For the latter, which we call noncooperative scenario, we are able to develop a result for N-player games with discrete (point) delays, where the players estimate only the diagonal entries of the Hessian matrix. Hence, we are also able to dominate sufficiently small off-diagonal terms using a small-gain argument [31] for the average system.

Our analysis presents a properly designed sequence of steps of averaging in infinite dimensions [32], Lyapunov functional [30] for the cooperative result and small-gain theorem for input-to-state stable (ISS) cascades of dynamical ODE- and PDE-systems [31] in the noncooperative outcome in order to prove closed-loop stability. For both scenarios of games, a small neighborhood of the Nash equilibrium is achieved, even in the presence of arbitrarily long (but fixed) delays. A numerical example with a two-player game address illustrates our theoretical results.

2 Notation and Terminology

We denote the partial derivatives of a function u(xt) as \(\partial _x u(x,t) = \partial u(x,t)/\partial x\), \(\partial _t u(x,t) = \partial u(x,t)/\partial t\). We conveniently use the compact notation \(u_x(x,t)\) and \(u_t(x,t)\) for the former and the latter, respectively. The 2-norm (Euclidean) of a finite-dimensional (ODE) state vector \(\vartheta (t)\) is denoted by single bars, \(|\vartheta (t)|\). In contrast, norms of functions (of x) are denoted by double bars. We denote the spatial \({\mathcal {L}}_2[0,D]\) norm of the PDE state u(xt) as \(\Vert u(t)\Vert _{{\mathcal {L}}_2([0,D])}^2 {:=} \int _{0}^{D}u^2(x,t)\hbox {d}x\), where we drop the index \({\mathcal {L}}_2([0,D])\) and write it as \(\Vert \cdot \Vert = \Vert \cdot \Vert _{{\mathcal {L}}_2([0,D])}\), if not otherwise specified [28].

As defined in [33], a vector function \(f(t,\epsilon ) \in {\mathbb {R}}^n\) is said to be of order \({\mathcal {O}}(\epsilon )\) over an interval \([t_1,t_2]\), if \(\exists k,{\bar{\epsilon }}: |f(t,\epsilon )| \le k\epsilon , \forall \epsilon \in [0,{\bar{\epsilon }}]\ \text {and}\ \forall t \in [t_1,t_2]\). In most cases we use k and \({\bar{\epsilon }}\) as generic constants, and we use \({\mathcal {O}}(\epsilon )\) to be interpreted as an order of magnitude relation for sufficiently small \(\epsilon \).

The definitions of input-to-state stability (ISS) for PDE-based and ODE-based systems is assumed to be as provided in [31, 33], respectively.

Let \(A\subseteq {\mathbb {R}}^{n}\) be an open set. By \(C^{0}(A;\varOmega )\), we denote the class of continuous functions on A, which take values in \(\varOmega \subseteq {\mathbb {R}}^{m}\). By \(C^{k}(A;\varOmega )\), where \(k\ge 1\) is an integer, we denote the class of functions on \(A\subseteq {\mathbb {R}}^{n}\) with continuous derivatives of order k, which take values in \(\varOmega \subseteq {\mathbb {R}}^{m}\). In addition, \(C([a,b];{\mathbb {R}}^{n})\) is the Banach space of continuous functions mapping the interval [ab] into \({\mathbb {R}}^{n}\), see [34, Chapter 2].

According to [34, 35], we assume the usual definitions for any delayed-system \({\dot{x}}(t)=f(t,x_{t}), t\ge t_{0}\) and \(x(t_{0}+\varTheta )=\xi (\varTheta ), \varTheta \in [-D_{\max },0]\), where \(t_{0}\) is an arbitrary initial time instant \(t_{0}\ge 0\), \(x(t) \in {\mathbb {R}}^{N}\) is the state vector, \(D_{\mathrm{{max}}} > 0\) is the maximum time delay allowed, the history function of the delayed state is given by \(x_{t}(\varTheta )=x(t+\varTheta ) \in C([-D_{\max },0];{\mathbb {R}}^{N})\), and the functional initial condition \(\xi \) is also assumed to be continuous on \([-D_{\max },0]\). Without loss of generality, we consider \(t_0=0\) throughout the paper.

3 N-Players Game with Quadratic Payoffs and Delays: General Formulation

As discussed earlier, game theory provides an important framework for mathematically modeling and analysis of scenarios involving different agents (players) where there is coupling in their actions, in the sense that their respective outcomes (outputs) \(y_{i}(t) \in {\mathbb {R}}\) do not depend exclusively on their own actions/strategies (input signals) \(\theta _{i}(t)\in {\mathbb {R}}\), with \(i=1 ,\ldots , N\), but at least on a subset of others’. Moreover, defining \(\theta {:=} [\theta _1, \ldots , \theta _N]^T\), each player’s payoff function \(J_{i}(\theta ) : {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}\) depends on the action \(\theta _j\) of at least one other Player j, \(j\not = i\). An N-tuple of actions, \(\theta ^*\) is said to be in Nash equilibrium, if no Player i can improve his payoff by unilaterally deviating from \(\theta _i^*\), this being so for all i [13]. Despite the vast number of publications on Nash equilibrium seeking [18], its study under time-delays is still an open problem.

For instance, the applications in economic analysis can be used to motivate the problem. Consider the dividend policies of two gas stations, where each station is powered by a different oil refinery. Basically, the price at the pumps is adjusted based on the current price a barrel of oil and stocks bought at previous values. Thus, the stations take time to pass on to consumers variations in the price of a barrel of oil in refineries. In this context of game theory, each gas station could be viewed as a player and the aforementioned phenomenon described can be interpreted as distinct delays \(D_{i}\) applied to their strategies (price at the pumps). An increase in the pumps’ price of the ith gas station results in lower, but not zero, sales of the ith gasoline \(y_{i}(t)\) and, consequently, increased sales for the other gas station; see Fig. 1.

Fig. 1
figure 1

Nash equilibrium seeking schemes applied by two players (\(N=2\)) in a duopoly market structure with delayed players’ actions

Hence, we consider games where the payoff function of each player is quadratic, expressed as a strictly concave combination of their delayed actions

$$\begin{aligned} J_{i}(\theta (t-D))=\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\theta _{j}(t-D_{j})\theta _{k}(t-D_{k})+\sum _{j=1}^{N}h_{j}^{i}\theta _{j}(t-D_{j})+c_{i}, \end{aligned}$$
(1)

where \(\theta _{j}(t-D_{j}) \in {\mathbb {R}}\) is the decision variable of Player j delayed by \(D_{j} \in {\mathbb {R}}^{+}\) units of time, \(H_{jk}^{i}\), \(h_{j}^{i}\), \(c_{i} \in {\mathbb {R}}\) are constants, \(H_{ii}^{i} < 0\), \(H_{jk}^{i} = H_{kj}^{i}\) and \(\epsilon _{jk}^{i} = \epsilon _{kj}^{i} > 0\), \(\forall i,j,k\).

Without loss of generality, we assume that the inputs have distinct known (constant) delays which are ordered so that

$$\begin{aligned} D = \text {diag}\{D_1,D_2,\ldots , D_N\}, \quad 0\le D_1 \le \cdots \le D_N. \end{aligned}$$
(2)

Moreover, given any \({\mathbb {R}}^N\)-valued signal f, the notation \(f^D\) denotes

$$\begin{aligned} f^D(t) {:=} f(t-D)= \begin{bmatrix} f_1(t-D_1)&f_2(t-D_2)&\ldots&f_N(t-D_N) \end{bmatrix}^T. \end{aligned}$$
(3)

Quadratic payoff functions are of particular interest in game theory, firstly because they constitute second-order approximations to other types of non-quadratic payoff functions, and secondly because they are analytically tractable, leading in general to closed-form equilibrium solutions which provide insight into the properties and features of the equilibrium solution concept under consideration [13].

For the sake of completeness, we provide here in mathematical terms, the definition of a Nash equilibrium \(\theta ^*=[\theta ^*_1, \ldots ,\theta _N^*]^T\) in an N-player game:

$$\begin{aligned} J_i(\theta _i^*,\theta _{-i}^*) \ge J_i(\theta _i,\theta _{-i}^*), \quad \forall \theta _i \in \varTheta _i, \quad i \in \{1, \ldots ,N\}, \end{aligned}$$
(4)

where \(J_i\) is the payoff function of Player i, the term \(\theta _i\) corresponds to its action, while \(\varTheta _i\) is its action set and \(\theta _{-i}\) denotes the actions of the other players. Hence, no player has an incentive to unilaterally deviate its action from \(\theta ^*\). In the duopoly example just mentioned, \(\varTheta _1=\varTheta _2={\mathbb {R}}\), where \({\mathbb {R}}\) denotes the set of real numbers.

In order to determine the Nash equilibrium solution in strictly concaveFootnote 1 quadratic games with N players, where each action set is the entire real line, one should differentiate \(J_{i}\) with respect to \(\theta _{i}(t-D_{i}), \forall i=1 ,\ldots , N\), setting the resulting expressions equal to zero, and solving the set of equations thus obtained. This set of equations, which also provides a sufficient condition due to the strict concavity, is

$$\begin{aligned} \sum _{j =1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\theta _{j}^{*}+h_{i}^{i}=0,\quad i=1 ,\ldots , N, \end{aligned}$$
(5)

which can be written in the form of matrices as

$$\begin{aligned} \begin{bmatrix} \epsilon _{11}^{1} H_{11}^{1} &{} \epsilon _{12}^{1} H_{12}^{1} &{} \ldots &{} \epsilon _{1N}^{1} H_{1N}^{1} \\ \epsilon _{21}^{2} H_{21}^{2} &{} \epsilon _{22}^{2} H_{22}^{2} &{} \ldots &{} \epsilon _{2N}^{2} H_{2N}^{2} \\ \vdots &{} \vdots &{} &{} \vdots \\ \epsilon _{N1}^{N} H_{N1}^{N} &{} \epsilon _{N2}^{N} H_{N2}^{N} &{} \ldots &{} \epsilon _{NN}^{N} H_{NN}^{N} \end{bmatrix} \begin{bmatrix} \theta _{1}^{*} \\ \theta _{2}^{*} \\ \vdots \\ \theta _{N}^{*} \end{bmatrix} =- \begin{bmatrix} h_{1}^{1} \\ h_{2}^{2} \\ \vdots \\ h_{N}^{N} \end{bmatrix} . \end{aligned}$$
(6)

Defining the Hessian matrix H and vectors \(\theta ^*\) and h by

$$\begin{aligned} H{:=} \begin{bmatrix} \epsilon _{11}^{1} H_{11}^{1} &{} \epsilon _{12}^{1} H_{12}^{1} &{} \ldots &{} \epsilon _{1N}^{1} H_{1N}^{1} \\ \epsilon _{21}^{2} H_{21}^{2} &{} \epsilon _{22}^{2} H_{22}^{2} &{} \ldots &{} \epsilon _{2N}^{2} H_{2N}^{2} \\ \vdots &{} \vdots &{} &{} \vdots \\ \epsilon _{N1}^{N} H_{N1}^{N} &{} \epsilon _{N2}^{N} H_{N2}^{N} &{} \ldots &{} \epsilon _{NN}^{N} H_{NN}^{N} \end{bmatrix} , \quad \theta ^{*}{:=} \begin{bmatrix} \theta _{1}^{*} \\ \theta _{2}^{*} \\ \vdots \\ \theta _{N}^{*} \end{bmatrix} , \quad h{:=} \begin{bmatrix} h_{1}^{1} \\ h_{2}^{2} \\ \vdots \\ h_{N}^{N} \end{bmatrix} , \end{aligned}$$
(7)

there exists only one Nash equilibrium at \(\theta ^{*}=-H^{-1}h\), if H is invertible:

$$\begin{aligned} \begin{bmatrix} \theta _{1}^{*} \\ \theta _{2}^{*} \\ \vdots \\ \theta _{N}^{*} \end{bmatrix} =- \begin{bmatrix} \epsilon _{11}^{1} H_{11}^{1} &{} \epsilon _{12}^{1} H_{12}^{1} &{} \ldots &{} \epsilon _{1N}^{1} H_{1N}^{1} \\ \epsilon _{21}^{2} H_{21}^{2} &{} \epsilon _{22}^{2} H_{22}^{2} &{} \ldots &{} \epsilon _{2N}^{2} H_{2N}^{2} \\ \vdots &{} \vdots &{} &{} \vdots \\ \epsilon _{N1}^{N} H_{N1}^{N} &{} \epsilon _{N2}^{N} H_{N2}^{N} &{} \ldots &{} \epsilon _{NN}^{N} H_{NN}^{N} \end{bmatrix}^{-1} \begin{bmatrix} h_{1}^{1} \\ h_{2}^{2} \\ \vdots \\ h_{N}^{N} \end{bmatrix} . \end{aligned}$$
(8)

For more details, see [13, Chapter 4].

The control objective is to design a novel extremum seeking-based strategy to reach the Nash equilibrium in (non)cooperative games subjected to distinct delays in the decision variables of the players (input signals).

Fig. 2
figure 2

Block diagram illustrating the Nash equilibrium seeking strategy of Sect. 5 performed for each player. In red color, the predictor feedback used to compensate the individual delay \(D_i\) for the noncooperative case

Figure 2 contains a schematic diagram that summarizes the proposed Nash equilibrium policy for each ith player where its output is given by

$$\begin{aligned} y_{i}(t)&=J_{i}(\theta (t-D)), \end{aligned}$$
(9)

where the vector \(\theta _{-i}(t-D_{-i})\) represents the delayed actions of all other players. The additive-multiplicative dither signals \(S_i(t)\) and \(M_i(t)\) are

$$\begin{aligned} S_{i}(t)&=a_{i}\sin (\omega _{i}t+\omega _{i}D_{i}), \end{aligned}$$
(10)
$$\begin{aligned} M_{i}(t)&=\frac{2}{a_{i}}\sin (\omega _{i}t), \end{aligned}$$
(11)

with nonzero constant amplitudes \(a_i>0\) at frequencies \(\omega _i \ne \omega _j\). Such probing frequencies \(\omega _i\)’s can be selected as

$$\begin{aligned} \omega _i=\omega _{i}'\omega ={\mathcal {O}}(\omega ), \quad i\in {1,2,\ldots , N}, \end{aligned}$$
(12)

where \(\omega \) is a positive constant and \(\omega _{i}'\) is a rational number. One possible choice is given in [36] as

$$\begin{aligned} \omega _{i}'\not \in \left\{ \omega _{j}', \ \frac{1}{2}(\omega _{j}'+\omega _{k}'), \ \omega _{j}'+2\omega _{k}', \ \omega _{j}'+\omega _{k}'\pm \omega _{l}'\right\} , \end{aligned}$$
(13)

for all distinct ijk and l. The remaining signals are defined throughout the paper depending on the type of game in question: cooperative games (Sect. 4) or noncooperative games (Sect. 5).

By considering \({\hat{\theta }}_{i}(t)\) as an estimate of \(\theta ^{*}_{i}\), one can define the estimation error:

$$\begin{aligned} {\tilde{\theta }}_{i}(t)&={\hat{\theta }}_{i}(t)-\theta _{i}^{*}. \end{aligned}$$
(14)

Then, from Eqs. (10), (14) and Fig. 2, it is easy to get

$$\begin{aligned} \theta _{i}(t)&= S_i(t)+{\hat{\theta }}_i(t) \end{aligned}$$
(15)
$$\begin{aligned}&=a_{i}\sin (\omega _{i}t+\omega _{i}D_{i})+{\tilde{\theta }}_{i}(t)+\theta _{i}^{*}, \end{aligned}$$
(16)

with the following time-delayed version

$$\begin{aligned} \theta _{i}(t-D_{i})&=a_{i}\sin (\omega _{i}t)+{\tilde{\theta }}_{i}(t-D_{i})+\theta _{i}^{*}. \end{aligned}$$
(17)

Therefore, from (1) and (17), the ith output signal in (9) can be rewritten as

$$\begin{aligned} y_{i}(t)&=\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\left[ a_{j}\sin (\omega _{j}t){\tilde{\theta }}_{k}(t-D_{k})+a_{k}\sin (\omega _{k}t){\tilde{\theta }}_{j}(t-D_{j})\right] +\nonumber \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\left[ \theta _{k}^{*}a_{j}\sin (\omega _{j}t)+\theta _{j}^{*}a_{k}\sin (\omega _{k}t)\right] +\nonumber \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\left[ \theta _{k}^{*}{\tilde{\theta }}_{j}(t-D_{j})+\theta _{j}^{*}{\tilde{\theta }}_{k}(t-D_{k})\right] +\nonumber \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}a_{j}a_{k}\sin (\omega _{j}t)\sin (\omega _{k}t)+\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\theta _{j}^{*}\theta _{k}^{*}+ \nonumber \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}{\tilde{\theta }}_{j}(t-D_{j}){\tilde{\theta }}_{k}(t-D_{k})+\nonumber \\&\quad +\sum _{j=1}^{N}h_{j}^{i}a_{j}\sin (\omega _{j}t)+\sum _{j=1}^{N}h_{j}^{i}{\tilde{\theta }}_{j}(t-D_{j})+\sum _{j=1}^{N}h_{j}^{i}\theta _{j}^{*}+c_{i}. \end{aligned}$$
(18)

The estimate \({\hat{G}}_{i}\) of the unknown gradient of each payoff \(J_i\) is given by

$$\begin{aligned} {\hat{G}}_{i}(t)&=M_{i}(t)y_{i}(t). \end{aligned}$$
(19)

Plugging (11) and (18) into (19) and computing the average of the resulting signal, lead us to

$$\begin{aligned} {\hat{G}}_{i}^\mathrm{{av}}(t)&=\frac{1}{\varPi }\int _0^\varPi M_i(\tau ) y_i \hbox {d}\tau =\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}{\tilde{\theta }}_{j}^\mathrm{{av}}(t-D_{j})+\underbrace{\sum \nolimits _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\theta _{j}^{*}+h_{i}^{i}}_{{=0, \text { from } (5)}}, \nonumber \\&=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}{\tilde{\theta }}_{j}^\mathrm{{av}}(t-D_{j}). \end{aligned}$$
(20)

with \(\varPi \) defined as

$$\begin{aligned} \varPi {:=} 2 \pi \times \text {LCM}\left\{ \frac{1}{\omega _i} \right\} , \end{aligned}$$
(21)

and LCM standing for the least common multiple.

At this point, if we neglect the prediction loop and the low-pass filter (both indicated in red color) in Fig. 2, the control law \(U_i(t)=k_i{\hat{G}}_i(t)\) could be obtained as in the classical ES approach. In this case, from Eqs. (14) and (20), we could write the average version of

$$\begin{aligned} \dot{{\tilde{\theta }}}_i(t)=U_i(t) \end{aligned}$$
(22)

such as

$$\begin{aligned} \dot{{\tilde{\theta }}}_{i}^{\mathrm{{av}}}(t)&=k_{i}{\hat{G}}_{i}^{\mathrm{{av}}}(t) \nonumber \\&=k_{i}\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}{\tilde{\theta }}_{j}^{\mathrm{{av}}}(t-D_{j}). \end{aligned}$$
(23)

Therefore, by defining \({\tilde{\theta }}^{\mathrm{{av}}}(t): = [{\tilde{\theta }}_{1}^{\mathrm{{av}}}(t),{\tilde{\theta }}_{2}^{\mathrm{{av}}}(t),\ldots ,{\tilde{\theta }}_{N}^{\mathrm{{av}}}(t)]^T \in {\mathbb {R}}^{N} \) in order to take into account all players, one has

$$\begin{aligned} \dot{{\tilde{\theta }}}^{\mathrm{{av}}}(t)&=KH{\tilde{\theta }}^{\mathrm{{av}}}(t-D), \end{aligned}$$
(24)

with \(K{:=}\text {diag}\{k_{1},\ldots ,k_{N}\}\) and H given by (7). Equation (24) means that even if KH was a Hurwitz matrix, the equilibrium \({\tilde{\theta }}_{\mathrm{{e}}}^{\mathrm{{av}}}=0\) of the closed-loop average system would not necessarily be stable for arbitrary values of the time-delays \(D_i\). This reinforces the demand of employing the prediction feedback \(U_i(t)=k_i{\hat{G}}_i(t+D_i)\)—or even its filtered version—for each player to stabilize collectively the closed-loop system, as illustrated with red color in Fig. 2.

On the other hand, the derivative of (20) is

$$\begin{aligned} \dot{{\hat{G}}}_{i}^{\mathrm{{av}}}(t)&=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\dot{{\tilde{\theta }}}_{j}^{\mathrm{{av}}}(t-D_{j}), \end{aligned}$$
(25)

and delaying by \(D_i\) units the time-argument of both sides of the average version of (22), we obtain

$$\begin{aligned} \dot{{\tilde{\theta }}}_{i}^{\mathrm{{av}}}(t-D_{i})=U_{i}^{\mathrm{{av}}}(t-D_{i}). \end{aligned}$$
(26)

Thus, Eq. (25) can be rewritten as

$$\begin{aligned} \dot{{\hat{G}}}_{i}^{\mathrm{{av}}}(t)&=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}U_{j}^{\mathrm{{av}}}(t-D_{j}). \end{aligned}$$
(27)

Taking into account all players, from (20) and (27), it is possible to find a compact form for the overall average estimated gradient \({\hat{G}}^{\mathrm{{av}}}(t) {:=} [{\hat{G}}_{1}^{\mathrm{{av}}}(t),\ldots ,{\hat{G}}_{N}^{\mathrm{{av}}}(t)]^T \in {\mathbb {R}}^{N}\) according to

$$\begin{aligned} {\hat{G}}^{\mathrm{{av}}}(t)&=H {\tilde{\theta }}^{\mathrm{{av}}}(t-D), \end{aligned}$$
(28)
$$\begin{aligned} \dot{{\hat{G}}}^{\mathrm{{av}}}(t)&=H U^{\mathrm{{av}}}(t-D), \end{aligned}$$
(29)

where H is given in (7) and \( U^{\mathrm{{av}}}(t) {:=} [U_{1}^{\mathrm{{av}}}(t),U_{2}^{\mathrm{{av}}}(t),\ldots ,U_{N}^{\mathrm{{av}}}(t)]^T \in {\mathbb {R}}^{N}\).

Throughout the paper the key idea is to design control laws (policies) for each player in order to achieve a small neighborhood of the Nash equilibrium point. To this end, we use an extremum seeking strategy based on prediction feedback to compensate multiple and distinct delays in the players’ actions. Basically, the control laws are able to ensure exponential stabilization of \({\hat{G}}^{\mathrm{{av}}}(t)\) and, consequently, of \({\tilde{\theta }}^{\mathrm{{av}}}(t)\). From (28), it is clear that if H is invertible, \({\tilde{\theta }}^{\mathrm{{av}}}(t) \rightarrow 0\) as \({\hat{G}}^{\mathrm{{av}}}(t) \rightarrow 0\). Hence, the convergence of \({\tilde{\theta }}^{\mathrm{{av}}}(t)\) to the origin results in the convergence of \(\theta (t)\) to a small neighborhood of \(\theta ^*\) in (4) via averaging theory [32].

4 Cooperative Scenario with Delays

In the cooperative scenario, the purpose of the extremum seeking is to estimate the Nash equilibrium vector \(\theta ^*\) by sharing among the players their outputs (payoffs)

$$\begin{aligned} y_i(t)=J_i(\theta (t-D)) \end{aligned}$$
(30)

in (1), as well as their own actions \(\theta _i(t)\) and control laws \(U_i(t)\). In this sense, we are able to formulate the closed-loop system in a centralized fashion with a multivariable framework on which each state variable corresponds to its corresponding player.

In this section, \(e_i \in {\mathbb {R}}^N\) stands for the i-th column of the identity matrix \(I_N \in {\mathbb {R}}^{N \times N}\) for each \(i \in \{1, 2,\ldots , N\}\).

4.1 Centralized Predictor with Shared Hessian Information Among Players

To this end, we redefine perturbation signals in (10) and (11) through a vector form S(t) and \(M(t) \in {\mathbb {R}}^N\) by

$$\begin{aligned}&S(t) = \begin{bmatrix} a_1\sin (\omega _1(t+D_1))&\cdots&a_N\sin (\omega _N(t+D_N)) \end{bmatrix}^T, \end{aligned}$$
(31)
$$\begin{aligned}&M(t) = \begin{bmatrix} \dfrac{2}{a_1}\sin (\omega _1 t)&\cdots&\dfrac{2}{a_N}\sin (\omega _N t) \end{bmatrix}^T. \end{aligned}$$
(32)

Notice that the delayed signal \(S^D\) of S is a conventional perturbation signal used in [36]. We also set the matrix-valued signal \(N(t) \in {\mathbb {R}}^{N \times N}\) as

$$\begin{aligned} N_{ij}(t) = \left\{ \begin{array}{l} \dfrac{16}{a^2_i} \bigg ( \sin ^2 (\omega _i t) - \dfrac{1}{2} \bigg ), \quad ~~i=j, \\ \dfrac{4}{a_i a_j} \sin (\omega _i t) \sin (\omega _j t), \quad i \ne j. \end{array} \right. \end{aligned}$$
(33)

By using the above signals, we develop a multivariable extremum seeking scheme in order to compensate collectively the presence of multiple input delays. Let the input signals (14) be constructed in the vector form as

$$\begin{aligned} \theta (t) = {\hat{\theta }}(t)+S(t), \end{aligned}$$
(34)

where \({\hat{\theta }}\) is an estimate of \(\theta ^*\). We also introduce the estimation error

$$\begin{aligned} {\tilde{\theta }}(t) {:=} {\hat{\theta }}^D(t) - \theta ^*. \end{aligned}$$
(35)

Note that the error is defined here with \({\hat{\theta }}^D\) rather than \({\hat{\theta }}\). With this error variable, the individual output signals or payoffs \(y_i(t)\) can be rewritten as

$$\begin{aligned} y_i(t) = J_i \left( \theta ^* + {\tilde{\theta }}(t) + S^D(t) \right) . \end{aligned}$$
(36)

To compensate the delays, we propose the following predictor-based update law:

$$\begin{aligned}&\dot{{\hat{\theta }}}(t) = U(t), \end{aligned}$$
(37)
$$\begin{aligned}&{\dot{U}}_i(t) = -\,cU_i(t) + ck_i \bigg (M_i(t) y_i(t) + {N_i(t)y_i(t)} \sum _{j=1}^{N}e_j \int _{t-D_j}^{t} U_j(\tau )\hbox {d}\tau \bigg ), \nonumber \\ \end{aligned}$$
(38)

for some positive constants \(c, k_i >0\) with \(k_i\) being the elements of the diagonal matrix \(K \in {\mathbb {R}}^{N \times N}\). Without loss of generality, we consider \(c_i=c\) in Fig. 2. Moreover, in the particular case of the cooperative scenario, the signal \(N_i(t)\) used in (38) is simply defined by

$$\begin{aligned} {N_i(t){:=}[N_{i1}(t) \ldots ~N_{iN}(t)]}, \end{aligned}$$
(39)

representing the vector with the elements of each row of the matrix N(t) with \(N_{ij}\) given in (33).

Since \(\dot{\hat{\theta }}^D(t) = U^D(t)\), differentiating the error variable \({\tilde{\theta }}\) with respect to t yields

$$\begin{aligned} \dot{{\tilde{\theta }}}(t) = U^D(t) = \sum _{i=1}^{N}e_iU_i(t-D_i), \end{aligned}$$
(40)

which is in a standard form of a system with input delays. As we will see later, the terms in the parentheses on the right-hand side of (38) correspond to a predicted value of \(H{\tilde{\theta }}\) at some time in the future in the average sense, i.e.,

$$\begin{aligned} {\hat{G}}^{\mathrm{{av}}}(t+D)= {\hat{G}}^{\mathrm{{av}}}(t) + H \sum _{i=1}^{N}e_i \int _{t-D_i}^{t} U_i^{\mathrm{{av}}}(\tau )\hbox {d}\tau . \end{aligned}$$
(41)

We obtain the future state (41) by simply applying the variation of constants formula to (29).

4.2 Reduction Approach: Exponential Stability Deduced from the Explicit Solutions

For the sake of simplicity, we assume \(c \rightarrow +\infty \) in (38) resulting in the following expression:

$$\begin{aligned} U_i(t)=k_i\left( M_i(t) y_i(t) + {N_i(t)y_i(t)} \sum _{j=1}^{N}e_j \int _{t-D_j}^{t} U_j(\tau )\hbox {d}\tau \right) , \end{aligned}$$
(42)

such that the delayed closed-loop system (40) and (42) can be written in the corresponding PDE representation form, given by [29]:

$$\begin{aligned}&\dot{{\tilde{\theta }}}(t) = \sum _{i=1}^{N}e_i u_i(0, t), \end{aligned}$$
(43)
$$\begin{aligned}&\partial _t u_i(x,t)=\partial _x u_i(x,t), \quad x\in ~ {]0,D_i[}, \quad i=1, 2, \ldots ,N, \end{aligned}$$
(44)
$$\begin{aligned}&u_i(D_i,t)=U_i(t) . \end{aligned}$$
(45)

The relation between \(u_i\) and \(U_i\) is given by \(u_i(x,t)=U_i(x+t-D_i)\).

In the reduction approach [37] (or finite-spectrum assignment), we use the transformation

$$\begin{aligned} Z(t)= & {} {\tilde{\theta }}(t)+ \sum _{i=1}^{N} \int _{t}^{t+D_i}e_i U_i(\tau -D_i) \hbox {d}\tau \nonumber \\= & {} {\tilde{\theta }}(t) + \sum _{i=1}^{N} \int _{0}^{D_i}e_i u_i(\xi ,t) \hbox {d}\xi . \end{aligned}$$
(46)

It is not difficult to see that Z satisfies

$$\begin{aligned} {\dot{Z}}(t)=\sum _{i=1}^{N}e_i U_i(t). \end{aligned}$$
(47)

This is the key fact in the reduction approach since Eq. (47) can be written in the simple form

$$\begin{aligned} {\dot{Z}}(t)=U(t). \end{aligned}$$
(48)

By employing the feedback law \(U(t)=-{\bar{K}}Z(t)\) into (48), which replaces (43), with \({\bar{K}}>0\) being a diagonal matrix \({\bar{K}}=\text {diag}({\bar{k}}_1 ~\cdots ~ {\bar{k}}_N)\), \({\bar{k}}_i>0\), then the closed-loop system (43)–(45) becomes

$$\begin{aligned}&{\dot{Z}}(t) = -\,{\bar{K}} Z(t), \end{aligned}$$
(49)
$$\begin{aligned}&\partial _t u_i(x,t)=\partial _x u_i(x,t), \quad x\in ~ {]0,D_i[}, \quad i=1, 2, \ldots ,N, \end{aligned}$$
(50)
$$\begin{aligned}&u_i(D_i,t)=-{\bar{k}}_iZ_i(t) . \end{aligned}$$
(51)

Exponential stability of the closed-loop system can be shown directly from (49) to (51) since the solution of the Z–subsystem is easily calculated as

$$\begin{aligned} Z(t)= \exp (-{\bar{K}}t)Z(0). \end{aligned}$$
(52)

Hence, \(Z(t) \rightarrow 0\) exponentially as \(t \rightarrow +\infty \). Then, for \(t>D_i\), the solution to the \(u_i\)–subsystem in (51) is obtained as

$$\begin{aligned} u_i(x,t)= -\,{\bar{k}}_i e_i^T \exp (-{\bar{K}}(x+t-D_i))Z(0), \quad x \in ~ {]0,D_i[}, \quad t>D_i. \end{aligned}$$
(53)

Clearly, for each \(x\in ~ {]0,D_i[}\), the state variables \(u_i(x,t)\), and consequently u(xt), converge to 0 as \(t \rightarrow +\infty \). The rate of convergence is exponential.

However, the control law \(U(t)=-{\bar{K}}Z(t)\) cannot be implemented since Z(t) in (46) was constructed with the unmeasured signal \({\tilde{\theta }}(t)\) in (35). However, in the average sense, the average version of (42) returns

$$\begin{aligned} U^{\mathrm{{av}}}(t)=\underbrace{KH}_{-{\bar{K}}}\left( \underbrace{{\tilde{\theta }}^{\mathrm{{av}}}(t) + \sum \nolimits _{j=1}^{N}e_j \int _{t-D_j}^{t} U^{\mathrm{{av}}}_j(\tau )\hbox {d}\tau }_{Z^{\mathrm{{av}}}(t)}\right) , \end{aligned}$$
(54)

since

$$\begin{aligned} {\hat{G}}_{i}^{\mathrm{{av}}}(t)&=\frac{1}{\varPi }\int _0^\varPi M_i(\tau ) y_i \hbox {d}\tau =\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}{\tilde{\theta }}_{j}^{\mathrm{{av}}}(t-D_{j}), \end{aligned}$$
(55)
$$\begin{aligned} {\hat{H}}_{i}^{\mathrm{{av}}}(t)&=\frac{1}{\varPi }\int _0^\varPi N_i(\tau ) y_i \hbox {d}\tau =\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}e_j^T, \end{aligned}$$
(56)

or, equivalently, \({\hat{G}}^{\mathrm{{av}}}(t)= H{\tilde{\theta }}^{\mathrm{{av}}}(t-D)\) and \({\hat{H}}^{\mathrm{{av}}}(t)=H\), where \({\hat{H}}_{i}^{\mathrm{{av}}}\) is the ith row vector of the Hessian H.

In the next sub-section, we show the average control law (54)—or its equivalent filtered version in (38)—is indeed able to stabilize in the average sense the closed-loop system (43)–(45).

4.3 Stability Analysis

In what follows, we make the same assumption as in [18] concerning the Hessian matrix H, which describes the interactions among the players in our cooperative game.

Assumption 1

The Hessian matrix H given by (7) is strictly diagonal dominant, i.e.,

$$\begin{aligned} \sum _{j\ne i}^{N}|\epsilon _{ij}^{i}H_{ij}^{i}| < |\epsilon _{ii}^{i}H_{ii}^{i}|, \quad i \in \{1, \ldots N\}. \end{aligned}$$
(57)

By Assumption 1, the Nash equilibrium \(\theta ^*\) exists and is unique since strictly diagonally dominant matrices are nonsingular by the Levy–Desplanques theorem  [38]. To attain \(\theta ^*\) stably in real time, without any model information (except for the delays \(D_i\)), each Player i employs the cooperative extremum seeking strategy (38) via predictor feedback with shared information. The next theorem provides the stability/convergence properties of the closed-loop extremum seeking feedback for the quadratic N-player cooperative game with delays under the cooperative scenario.

Theorem 4.1

Consider the closed-loop system (38)–(40) under Assumption 1 and multiple and distinct input delays \(D_i\) for an N-players game with payoff functions given in (1) and under the cooperative scenario. There exists a constant \(c^*>0\) such that \(\forall c \ge c^*\), \(~\exists ~\omega ^*(c)>0\) such that \(\forall \omega >\omega ^*\), the closed-loop delayed system (38) and (40) with state \({{\tilde{\theta }}}_i(t-D_i)\), \(U_i(\tau )\), \(\forall \tau \in [t-D_i,t]\) and \(\forall i\in {1,2,\ldots , N}\), has a unique locally exponentially stable periodic solution in t of period \(\varPi \), denoted by \({\tilde{\theta }}_i^{\varPi }(t-D_i)\), \(U_i^{\varPi }(\tau )\), \(\forall \tau \in [t-D_i,t]\) satisfying, \(\forall t\ge 0\):

$$\begin{aligned} \left( \sum _{i=1}^{N}\left[ {\tilde{\theta }}_{i}^{\varPi }(t-D_i)\right] ^2 + \left[ U_{i}^{\varPi }(t)\right] ^2 + \int _{t-D_i}^{t} \left[ U_{i}^{\varPi }(\tau )\right] ^2\hbox {d}\tau \right) ^{1/2} \le {\mathcal {O}}(1/\omega ). \end{aligned}$$
(58)

Furthermore,

$$\begin{aligned} \limsup _{t\rightarrow +\infty }|\theta (t)-\theta ^*|= & {} {\mathcal {O}}(|a|+1/\omega ), \end{aligned}$$
(59)

where \(a=[a_1 \ a_2 \ \cdots \ a_N]^T\) and \(\theta ^*\) is the unique Nash equilibrium given by (8).

Proof

The proof is carried out in 5 steps as follows.

First, in Steps 1 and 2, we write the equations of the closed-loop system as well as its corresponding average version by employing a PDE representation for the transport delays. In Step 3, we show the exponential stability of the average closed-loop system using a Lyapunov–Krasovskii functional. Then, we invoke the averaging theorem for infinite-dimensional systems [32] in Step 4 to show the exponential stability of the original closed-loop system. Finally, Step 5 shows the convergence of \(\theta (t)\) to a small neighborhood of the Nash equilibrium \(\theta ^*\).

Step 1: Closed-loop System

A PDE representation of the closed-loop system (38), (40) is given by

$$\begin{aligned}&\dot{{\tilde{\theta }}}(t) = u(0, t), \end{aligned}$$
(60)
$$\begin{aligned}&u_t(x, t) = D^{-1}u_x(x, t), \quad x \in ~ {]0,1[}, \end{aligned}$$
(61)
$$\begin{aligned}&u(1, t) = U(t), \end{aligned}$$
(62)
$$\begin{aligned}&{\dot{U}}(t) = -\,cU(t) + cK \bigg ({{\hat{G}}(t)} + {{\hat{H}}(t)} \int _{0}^{1} Du(x,t)\hbox {d}x \bigg ), \end{aligned}$$
(63)

where \(u(x,t) = (u_1(x,t), u_2(x,t), \cdots ,u_N(x,t))^T \in {\mathbb {R}}^N\) and

$$\begin{aligned} {{\hat{G}}(t)}= & {} [M_1(t) y_1(t) ~\ldots ~ M_N(t) y_N(t)]^T\in {\mathbb {R}}^{N\times N}, \end{aligned}$$
(64)
$$\begin{aligned} {{\hat{H}}(t)}= & {} [N_1(t) y_1(t) ~\ldots ~ N_N(t) y_N(t)]^T ~\in {\mathbb {R}}^{N\times N}. \end{aligned}$$
(65)

It is easy to see that the solution of (61) under the condition (62) is represented as

$$\begin{aligned} u_i(x,t) = U_i(D_ix + t - D_i) \end{aligned}$$
(66)

for each \(i \in \{1, 2, \ldots , N\}\). Hence, we have

$$\begin{aligned} \int _{0}^{1}D_iu_i(x,t)\hbox {d}x= & {} \int _{t-D_i}^{t}u_i\left( \dfrac{\tau -t+D_i}{D_i}, t\right) \hbox {d}\tau \nonumber \\= & {} \int _{t-D_i}^{t}U_i(\tau )\hbox {d}\tau . \end{aligned}$$
(67)

This means that

$$\begin{aligned} \int _{0}^{1}Du(x,t)\hbox {d}x= & {} \sum _{i=1}^{N}e_i \int _{0}^{1}D_iu_i(x,t)\hbox {d}x \nonumber \\= & {} \sum _{i=1}^{N}e_i \int _{t-D_i}^{t}U_i(\tau )\hbox {d}\tau . \end{aligned}$$
(68)

Thus, we can recover (40) from (61) to (63).

Step 2: Average Closed-loop System

The average system associated with (60)–(63) is given by

$$\begin{aligned}&\dot{{\tilde{\theta }}}_\mathrm{av}(t) = u_\mathrm{av}(0, t), \end{aligned}$$
(69)
$$\begin{aligned}&u_\mathrm{{av},t}(x, t) = D^{-1}u_{\mathrm{{av}},x}(x, t), \quad x \in ~ {]0,1[}, \end{aligned}$$
(70)
$$\begin{aligned}&u_\mathrm{av}(1, t) = U_\mathrm{av}(t), \end{aligned}$$
(71)
$$\begin{aligned}&{\dot{U}}_\mathrm{av}(t) = -\,cU_\mathrm{av}(t) + cKH \left( {\tilde{\theta }}_\mathrm{av}(t) + \int _{0}^{1} Du_\mathrm{av}(x,t)\hbox {d}x \right) , \end{aligned}$$
(72)

where we have used the fact that the averages of \({\hat{G}}(t)\) and \({\hat{H}}(t)\) are calculated as \(H{\tilde{\theta }}_\mathrm{av}(t)\) and H.

Step 3: Exponential Stability via Lyapunov–Krasovskii Functional

For simplicity of notation, let us introduce the following auxiliary variables

$$\begin{aligned}&\vartheta (t){:=} H \left( {\tilde{\theta }}_\mathrm{av}(t) + \int _{0}^{1} Du_\mathrm{av}(x,t)\hbox {d}x \right) , \end{aligned}$$
(73)
$$\begin{aligned}&{\tilde{U}} = U_\mathrm{av} - K\vartheta . \end{aligned}$$
(74)

With this notation, (72) can be represented simply as \({\dot{U}}_\mathrm{av} = -\,c{\tilde{U}}\). In addition, differentiating (73) with respect to t yields

$$\begin{aligned} {\dot{\vartheta }} = HU_\mathrm{av}(t). \end{aligned}$$
(75)

We prove the exponential stability of the closed-loop system by using the Lyapunov functional defined by

$$\begin{aligned} V(t)&= \vartheta (t)^TK\vartheta (t) + \dfrac{1}{4}\lambda _\mathrm{min}(-H) \int _{0}^{1} \Big ( (1+x)u_\mathrm{av}(x,t)^T Du_\mathrm{av}(x,t)\hbox {d}x \Big )\nonumber \\&\qquad +\dfrac{1}{2}{\tilde{U}}(t)^T(-H){\tilde{U}}(t). \end{aligned}$$
(76)

Recall that K and D are diagonal matrices with positive entries and that H is a negative-definite matrix. Hence, all of K, D, and \(-H\) are positive-definite matrices. In particular, we use the Gershgorin circle theorem [38, Theorem 6.1.1] to guarantee \(\lambda (H)\subseteq \bigcup _{i=1}^{N}\rho _i\), where \(\lambda (H)\) denotes the spectrum of H and \(\rho _i\) is a Gershgorin disc:

$$\begin{aligned} \rho _i=\left\{ z \in {\mathbb {C}} ~ {:} ~ |z-\epsilon _{ii}^{i} H_{ii}^{i}| < \sum _{j\ne i}\left| \epsilon _{ij}^{i}H_{ij}^{i}\right| \right\} . \end{aligned}$$
(77)

Since \(\epsilon _{ii}^{i}H_{ii}^{i}<0\) and H is strictly diagonally dominant, the union of the Gershgorin discs lies strictly in the left-half side of the complex plane, and we conclude that \(\text{ Re }\{\lambda \}<0\) for all \(\lambda \in \lambda (H)\).

For simplicity of notation, we suppress explicit dependence of the variables on t. The time derivative of V is given by

$$\begin{aligned} {\dot{V}}= & {} 2\vartheta ^TKHU_\mathrm{av} + \dfrac{1}{2}\lambda _\mathrm{min}(-H)U_\mathrm{av}^TU_\mathrm{av} \nonumber \\&- \dfrac{1}{4}\lambda _\mathrm{min}(-H)u(0)^Tu(0)- \dfrac{1}{4}\lambda _\mathrm{min}(-H) \nonumber \\&\times \int _{0}^{1}u_\mathrm{av}(x)^Tu_\mathrm{av}(x)\hbox {d}x + {\tilde{U}}^T(-H)\left( {\dot{U}}_\mathrm{av}-KHU_\mathrm{av}\right) \nonumber \\\le & {} 2\vartheta ^TKHU_\mathrm{av} + \dfrac{1}{2}U_\mathrm{av}^T(-H)U_\mathrm{av} - \dfrac{1}{8D_\mathrm{max}}\lambda _\mathrm{min}(-H) \times \nonumber \\&\times \int _{0}^{1}(1+x)u_\mathrm{av}(x)^TDu_\mathrm{av}(x)\hbox {d}x +\nonumber \\&+ {\tilde{U}}^T(-H){\dot{U}}_\mathrm{av} + {\tilde{U}}^T(-H)K(-H)U_\mathrm{av}. \end{aligned}$$
(78)

Applying Young’s inequality to the last term leads to

$$\begin{aligned} {\tilde{U}}^T(-H)K(-H)U_\mathrm{av} \le \dfrac{1}{2}{\tilde{U}}^T(-HKHKH){\tilde{U}} + \dfrac{1}{2}U_\mathrm{av}^T(-H)U_\mathrm{av}. \end{aligned}$$
(79)

Then, completing the square yields

$$\begin{aligned} {\dot{V}}\le & {} {\tilde{U}}^T(-H){\tilde{U}} -\vartheta ^TK(-H)K\vartheta \nonumber \\&- \dfrac{1}{8D_\mathrm{max}}\lambda _\mathrm{min}(-H)\int _{0}^{1}(1+x)u_\mathrm{av}(x)^TDu_\mathrm{av}(x)\hbox {d}x \nonumber \\&+ {\tilde{U}}^T(-H){\dot{U}}_\mathrm{av} + \dfrac{1}{2}{\tilde{U}}^T(-HKHKH){\tilde{U}} \nonumber \\= & {} {\tilde{U}}^T(-H)\left( {\dot{U}}_\mathrm{av}+c^*{\tilde{U}}\right) - \vartheta ^TK(-H)K\vartheta \nonumber \\&- \dfrac{1}{8D_\mathrm{max}}\lambda _\mathrm{min}(-H)\int _{0}^{1} (1 + x)u_\mathrm{av}(x)^T Du_\mathrm{av}(x)\hbox {d}x, \end{aligned}$$
(80)

where \(c^*{:=}1+\lambda _\mathrm{max}(-HKHKH)/\lambda _\mathrm{min}(-H)\). Hence, by setting \({\dot{U}}_\mathrm{av} = -\,c{\tilde{U}}\) for some \(c > c^*\), we see that there exists \(\mu >0\) such that

$$\begin{aligned} {\dot{V}} \le -\mu V. \end{aligned}$$
(81)

Finally, it is not difficult to find positive constants \(\alpha , \beta > 0\) such that

$$\begin{aligned}&\alpha \left( |{\tilde{\theta }}_\mathrm{av}(t)|^2 + \int _{0}^{1}|u_\mathrm{{av}}(x,t)|^2\hbox {d}x+ |{\tilde{U}}(t)|^2\right) \le V(t) \nonumber \\&\qquad \le \beta \left( |{\tilde{\theta }}_\mathrm{av}(t)|^2 + \int _{0}^{1}|u_\mathrm{{av}}(x,t)|^2\hbox {d}x+ |{\tilde{U}}(t)|^2\right) . \end{aligned}$$
(82)

Therefore, the average system (69)–(72) is exponentially stable as long as \(c > c^*\).

Step 4: Invoking the Averaging Theorem for Infinite-Dimensional Systems

Indeed, the closed-loop system (38)–(40) can be rewritten as:

$$\begin{aligned} {{\dot{\eta }}}(t)= & {} f(\omega t, \eta _t), \end{aligned}$$
(83)

where \(\eta (t) = \left[ \eta _{1}(t),\eta _{2}(t)\right] ^{T} {:=} \left[ {\tilde{\theta }}(t),U(t)\right] ^{T}\) is the state vector and \(\eta _{t}(\varTheta ) = \eta (t + \varTheta )\) for \(-D_{N}\le \varTheta \le 0\). The vector field f is given in Eq. (84), such that \({\dot{\eta }}(t)=\left[ \dot{{\tilde{\theta }}}(t),{\dot{U}}(t)\right] ^{T}=f(\omega t,\eta _{t})\). At this point, it is worth to note that the variable \(\eta _{t}\in C([-D_{N}, 0]; {\mathbb {R}}^{2N})\) does not include only terms that suffer a discrete delay action (for instance, \({\tilde{\theta }}_{i}(t-D_{i})\) and \(U_{i}(t-D_{i})\)), but this representation also includes operations with terms of distributed delays such as \(\int _{-D_{i}}^{0}U_{i}(t+\tau )\hbox {d}\tau \). For the sake of clarity, let us express the discrete terms by \( \eta _{t1} = \left[ {\tilde{\theta }}_{1}(t-D_{1}), \ldots ,{\tilde{\theta }}_{N}(t-D_{N})\right] ^{T}\), and \(\eta _{t2}=\left[ U_{1}(t-D_{1}), \ldots , U_{N}(t-D_{N})\right] ^{T}\), for \(\varTheta =-D_{i}\) in each element of the vector, whereas \(\eta _{t3} = \left[ U_{1}(t+\varTheta _1), \ldots , U_{N}(t+\varTheta _N)\right] ^{T}\) denotes the distributed terms for \(\varTheta _i \in \varTheta = [-D_{N},0]\). The variable \(\upsilon = g(t,\eta _{t})\) with \(g : {\mathbb {R}}_{+} \times \varOmega \rightarrow {\mathbb {R}}^{N}\) and \(\upsilon = \left[ \int _{-D_{1}}^{0}\eta _{t3}^{[1]}(\tau )\hbox {d}\tau , \ldots , \int _{-D_{N}}^{0}\eta _{t3}^{[N]}(\tau )\hbox {d}\tau \right] ^{T}\), where \(\eta _{t3}^{[i]}\) represents the ith element of the vector \(\eta _{t3}\), while \(f : {\mathbb {R}}_{+} \times \varOmega \rightarrow {\mathbb {R}}^{2N}\) is a continuous functional from a neighborhood \(\varOmega \) of 0 of the supremum-normed Banach space \(X = C([-D_{N}, 0]; {\mathbb {R}}^{2N})\) of continuous functions from \([-D_{N}, 0]\) to \({\mathbb {R}}^{2N}\).

From Eq. (40), one has

$$\begin{aligned} \dot{{\tilde{\theta }}}(t)=U(t-D), \end{aligned}$$

with \(U(t-D)= \begin{bmatrix} U_1(t-D_1),&U_2(t-D_2),&\ldots&, U_N(t-D_N) \end{bmatrix}^T\). Noting that the term \(\int _{t-D_{i}}^{t}U_{i}(\tau )\hbox {d}\tau =\int _{-D_{i}}^{0}U_{i}(t+\tau )\hbox {d}\tau \) and plugging (12), (32) and (33) into (38), one can write

$$\begin{aligned} {\dot{U}}_{i}(t)&= -\,cU_{i}(t)+ck_{i}y_{i}(t)\left\{ \frac{2}{a_{i}}\sin (\omega _{i}'\omega t)+\frac{16}{a_{i}^2}\left[ \sin ^{2}(\omega _{i}'\omega t)-\frac{1}{2}\right] \times \right. \\&\quad \left. \times \int _{-D_{i}}^{0}U_{i}(t+\tau )\hbox {d}\tau +\sum _{j\ne i}\frac{4}{a_{i}a_{j}}\sin (\omega _{i}'\omega t)\sin (\omega _{j}'\omega t)\int _{-D_{j}}^{0}U_{j}(t+\tau )\hbox {d}\tau \right\} . \end{aligned}$$

Then, the dynamics of \(\eta \) are simply

$$\begin{aligned} {\dot{\eta }}_{1}^{[1]}&=f_{1}=\eta _{t2}^{[1]}, \nonumber \\ \vdots&\quad \vdots \nonumber \\ {\dot{\eta }}_{1}^{[i]}&=f_{i}=\eta _{t2}^{[i]}, \nonumber \\ \vdots&\quad \vdots \nonumber \\ {\dot{\eta }}_{1}^{[N]}&=f_{N}=\eta _{t2}^{[N]}, \nonumber \\ {\dot{\eta }}_{2}^{[1]}&=f_{N+1}=-c\eta _{2}^{[1]}+ck_{1}y_{1}\left\{ \frac{2}{a_{1}}\sin (\omega _{1}'\omega t)+\frac{16}{a_{1}^2}\left[ \sin ^{2}(\omega _{1}'\omega t)-\frac{1}{2}\right] \upsilon _{1}+\right. \nonumber \\&\qquad \left. ~~~~~~~~~~+\sum _{j= 2}^{N}\frac{4}{a_{1}a_{j}}\sin (\omega _{1}'\omega t)\sin (\omega _{j}'\omega t)\upsilon _{j}\right\} , \nonumber \\ \vdots&\quad \vdots \nonumber \\ {\dot{\eta }}_{2}^{[i]}&=f_{N+i}=-c\eta _{2}^{[i]}+ck_{i}y_{i}\left\{ \frac{2}{a_{i}}\sin (\omega _{i}'\omega t)+\frac{16}{a_{i}^2}\left[ \sin ^{2}(\omega _{i}'\omega t)-\frac{1}{2}\right] \upsilon _{i}+\right. \nonumber \\&\qquad \left. ~~~~~~~~~~+\sum _{j\ne i}\frac{4}{a_{i}a_{j}}\sin (\omega _{i}'\omega t)\sin (\omega _{j}'\omega t)\upsilon _{j}\right\} , \nonumber \\ \vdots&\quad \vdots \nonumber \\ {\dot{\eta }}_{2}^{[N]}&=f_{2N}=-c\eta _{2}^{[N]}+ck_{N}y_{N}\left\{ \frac{2}{a_{N}}\sin (\omega _{N}'\omega t)+\frac{16}{a_{N}^2}\left[ \sin ^{2}(\omega _{N}'\omega t)-\frac{1}{2}\right] \upsilon _{N}+\right. \nonumber \\&\qquad \left. ~~~~~~~~+\sum _{j= 1}^{N-1}\frac{4}{a_{N}a_{j}}\sin (\omega _{N}'\omega t)\sin (\omega _{j}'\omega t)\upsilon _{j}\right\} , \end{aligned}$$
(84)

where \(y_{i}=y_{i}(t)=y_{i}(\omega t, \eta _{t1})\) according to (18), for all \(i \in \left\{ 1,\ldots ,N\right\} \), satisfying

$$\begin{aligned} y_{i}&=\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\left[ a_{j}\sin (\omega _{j}'\omega t)\eta _{t1}^{[k]}+a_{k}\sin (\omega _{k}'\omega t)\eta _{t1}^{[j]}\right] + \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\left[ \theta _{k}^{*}a_{j}\sin (\omega _{j}'\omega t)+\theta _{j}^{*}a_{k}\sin (\omega _{k}'\omega t)\right] + \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}a_{j}a_{k}\sin (\omega _{j}'\omega t)\sin (\omega _{k}'\omega t)+ \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\left[ \theta _{k}^{*}\eta _{t1}^{[j]}+\theta _{j}^{*}\eta _{t1}^{[k]}\right] +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\eta _{t1}^{[j]}\eta _{t1}^{[k]}+ \\&\quad +\frac{1}{2}\sum _{j=1}^{N}\sum _{k=1}^{N}\epsilon _{jk}^{i}H_{jk}^{i}\theta _{j}^{*}\theta _{k}^{*}+\sum _{j=1}^{N}h_{j}^{i}a_{j}\sin (\omega _{j}'\omega t)+\sum _{j=1}^{N}h_{j}^{i}\eta _{t1}^{[j]}+\sum _{j=1}^{N}h_{j}^{i}\theta _{j}^{*}+c_{i}. \end{aligned}$$

Therefore, \(f(\omega t, \eta _{t}) {:=} [f_{1},\ldots ,f_{2N}]^{T}\) is simply given by the right-hand side of equation (84) such that the averaging theorem by [32] (see Theorem A.1 in Appendix) can be directly applied, considering \(\omega =1/\epsilon \).

From (81), the origin of the average closed-loop system (69)–(72) with transport PDE for delay representation is locally exponentially stable. Then, from (73) and (74), we can conclude the same results in the norm

$$\begin{aligned} \left( \sum _{i=1}^{N} \left[ {{{\tilde{\theta }}}}_i^{\mathrm{av}}(t - D_i)\right] ^2 + \int _{0}^{D_i} [u_i^{\mathrm{{av}}}(x,t)]^2 \hbox {d}x + [u_i^{\mathrm{{av}}}(D_i,t)]^2\right) ^{1/2} \end{aligned}$$

since H is non-singular.

Thus, there exist positive constants \(\alpha \) and \(\beta \) such that all solutions satisfy

$$\begin{aligned} \varPsi (t)\le \alpha e^{-\beta t}\varPsi (0), \quad \forall t\ge 0, \end{aligned}$$

where \(\varPsi (t) \triangleq \sum _{i=1}^{N} \left[ {{\tilde{\theta }}}_i^{\mathrm{av}}(t - D_i)\right] ^2 + \int _{0}^{D_i} \left[ u_i^{\mathrm{{av}}}(x,t)\right] ^2 \hbox {d}x + \left[ u_i^{\mathrm{{av}}}(D_i,t)\right] ^2\), or equivalently,

$$\begin{aligned} \varPsi (t)\triangleq \sum _{i=1}^{N} \left[ {{\tilde{\theta }}}_i^{\mathrm{av}}(t-D_i)\right] ^2 + \int _{t-D_i}^{t} \left[ U_i^{\mathrm{{av}}}(\tau )\right] ^2 \hbox {d}\tau + \left[ U_i^{\mathrm{{av}}}(t)\right] ^2, \end{aligned}$$
(85)

using (66). Then, according to the averaging theorem by [32] (see also Theorem A.1 in Appendix), for \(\omega \) sufficiently large, (38)–(40), or equivalently (69)–(72), has a unique locally exponentially stable periodic solution around its equilibrium (origin) satisfying (58).

Step 5: Asymptotic Convergence to a Neighborhood of the Nash equilibrium

By first using the change of variables \({\tilde{\vartheta }}_i(t) {:=} {\tilde{\theta }}_i(t-D_i)={\hat{\theta _i}}(t-D_i)-\theta _i^*\), and then integrating both sides of (60) over \([t, \sigma +D_i]\), we have:

$$\begin{aligned} {\tilde{\vartheta }}_i(\sigma +D_i)={{\tilde{\vartheta }}}_i(t)+\int _{t}^{\sigma +D_i}u_i(0,s)\hbox {d}s , \qquad i=1,\ldots ,N. \end{aligned}$$
(86)

From (66), we can rewrite (86) in terms of U, namely

$$\begin{aligned} {\tilde{\vartheta }}_i(\sigma +D_i)={{\tilde{\vartheta }}}_i(t)+\int _{t-D_i}^{\sigma }U_i(\tau ) \hbox {d}\tau . \end{aligned}$$
(87)

Now, note that

$$\begin{aligned} {\tilde{\theta }}_i(\sigma )={{\tilde{\vartheta }}}_i(\sigma +D_i), \quad \forall \sigma \in [t-D_i,t]. \end{aligned}$$
(88)

Hence,

$$\begin{aligned} {\tilde{\theta }}_i(\sigma )={{\tilde{\theta }}}_i(t-D_i)+\int _{t-D_i}^{\sigma }U_i(\tau ) \hbox {d}\tau , \quad \forall \sigma \in [t-D_i,t]. \end{aligned}$$
(89)

Applying the supremum norm to both sides of (89), we have

$$\begin{aligned}&\sup _{t-D_i \le \sigma \le t}\left| {\tilde{\theta }}_i(\sigma )\right| \nonumber \\&=\sup _{t-D_i \le \sigma \le t}\left| {{\tilde{\theta }}}_i(t - D_i) \right| + \sup _{t-D_i \le \sigma \le t}\left| \int _{t-D_i}^{\sigma } U_i(\tau ) \hbox {d}\tau \right| \nonumber \\&\le \sup _{t-D_i \le \sigma \le t}\left| {{\tilde{\theta }}}_i(t - D_i)\right| + \sup _{t-D_i \le \sigma \le t}\int _{t-D_i}^{t} \left| U_i(\tau )\right| \hbox {d}\tau \nonumber \\&\le \left| {{\tilde{\theta }}}_i(t - D_i)\right| + \int _{t-D_i}^{t} \left| U_i(\tau )\right| \hbox {d}\tau \quad {{\text { (by Cauchy--Schwarz)}}} \nonumber \\&\le \left| {{\tilde{\theta }}}_i(t - D_i)\right| + \left( \int _{t-D_i}^{t} \hbox {d}\tau \right) ^{1/2} \times \left( \int _{t - D_i}^{t} \left| U_i(\tau )\right| ^2 \hbox {d}\tau \right) ^{1/2} \nonumber \\&\le \left| {{\tilde{\theta }}}_i(t - D_i)\right| + \sqrt{D_i} \left( \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$
(90)

Now, it is easy to check

$$\begin{aligned}&\left| {{\tilde{\theta }}}_i(t-D_i)\right| \le \left( \left| {{\tilde{\theta }}}_i(t-D_i)\right| ^{2} + \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2}, \end{aligned}$$
(91)
$$\begin{aligned}&\left( \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2} \le \left( \left| {{\tilde{\theta }}}_i(t-D_i)\right| ^{2} + \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$
(92)

By using (91) and (92), one has

$$\begin{aligned} \left| {{\tilde{\theta }}}_i(t-D_i)\right|&+ \sqrt{D_i} \left( \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2} \nonumber \\&\le (1+\sqrt{D_i})\left( \left| {{\tilde{\theta }}}_i(t-D_i)\right| ^{2} + \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$
(93)

From (90), it is straightforward to conclude that

$$\begin{aligned} \sup _{t-D_i \le \sigma \le t}\left| {\tilde{\theta }}_i(\sigma )\right| \le (1 + \sqrt{D_i})\left( \left| {{\tilde{\theta }}}_i(t - D_i)\right| ^{2} + \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2} \end{aligned}$$
(94)

and, consequently,

$$\begin{aligned} \left| {\tilde{\theta }}_i(t)\right| \le (1 + \sqrt{D_i})\left( \left| {{\tilde{\theta }}}_i(t - D_i)\right| ^{2} + \int _{t-D_i}^{t} U_i^2(\tau ) \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$
(95)

Inequality (95) can be given in terms of the periodic solution \({\tilde{\theta }}_i^{\varPi }(t-D_i)\), \(U_i^{\varPi }(\tau )\), \(\forall \tau \in [t-D_i,t]\) as follows:

$$\begin{aligned} \left| {\tilde{\theta }}_i(t)\right|&\le (1 + \sqrt{D_i})\left( \left| {{\tilde{\theta }}}_i(t - D_i) - {\tilde{\theta }}_i^{\varPi }(t - D_i) + {\tilde{\theta }}_i^{\varPi }(t - D_i)\right| ^{2} \right. +\nonumber \\&\quad \left. + \int _{t-D_i}^{t} \left[ U_i(\tau ) - U_i^{\varPi }(\tau ) + U_i^{\varPi }(\tau )\right] ^2 \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$
(96)

By applying Young’s inequality and some algebra, the right-hand side of (96) and \(\left| {\tilde{\theta }}_i(t)\right| \) can be majorized by

$$\begin{aligned} \left| {\tilde{\theta }}_i(t)\right|&\le \sqrt{2}~(1 + \sqrt{D_i})\left( \left| {{\tilde{\theta }}}_i(t - D_i) - {\tilde{\theta }}_i^{\varPi }(t - D_i)\right| ^2 + \left| {\tilde{\theta }}_i^{\varPi }(t - D_i) \right| ^{2} \right. + \nonumber \\&\quad \left. + \int _{t-D_i}^{t} \left[ U_i(\tau ) - U_i^{\varPi }(\tau )\right] ^2 \hbox {d}\tau + \int _{t-D_i}^{t} \left[ U_i^{\varPi }(\tau )\right] ^2 \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$
(97)

According to the averaging theorem by [32], we can conclude that the actual state converges exponentially to the periodic solution, i.e., \({{\tilde{\theta }}}_i(t-D_i)-{\tilde{\theta }}_i^{\varPi }(t-D_i) \rightarrow 0\) and \(\int _{t-D_i}^{t} \left[ U_i(\tau )-U_i^{\varPi }(\tau )\right] ^2 \hbox {d}\tau \rightarrow 0\), exponentially.

Hence,

$$\begin{aligned} \limsup _{t\rightarrow +\infty }|{\tilde{\theta }}_i(t)|&= \sqrt{2}~\left( 1+\sqrt{D_i}\right) \\&\quad \times \left( \left| {\tilde{\theta }}_i^{\varPi }(t-D_i)\right| ^{2}+ \int _{t-D_i}^{t} [U_i^{\varPi }(\tau )]^2 \hbox {d}\tau \right) ^{1/2}. \end{aligned}$$

Then, from (58), we can write \(\limsup _{t\rightarrow +\infty }|{\tilde{\theta }}(t)| = {\mathcal {O}}(1/\omega )\). From (35) and recalling that \(\theta (t)={\hat{\theta }}(t)+S(t)\) in (34) with S(t) in (31), one has that \(\theta (t)-\theta ^*={\tilde{\theta }}(t)+S(t)\). Since the first term on the right-hand side is ultimately of order \({\mathcal {O}}(1/\omega )\) and the second term is of order \({\mathcal {O}}(|a|)\), we arrive at (59). \(\square \)

5 Noncooperative Scenario with Delays

In the game under the noncooperative scenario, again subject to multiple and distinct delays, the purpose of the extremum seeking is still to estimate the Nash equilibrium vector \(\theta ^*\), but without allowing any sharing of information among the players. Each player only needs to measure the value of its own payoff function

$$\begin{aligned} y_i(t)=J_i(\theta (t-D)), \end{aligned}$$
(98)

with \(J_i\) given by (1). In this sense, we are able to formulate the closed-loop system in a decentralized fashion, where no knowledge about the payoffs \(y_{-i}\) or actions \(\theta _{-i}\) of the other players is required.

5.1 Decentralized Predictor Using Only the Known Diagonal Terms of the Hessian

In such a decentralized scenario, the dither frequencies \(\omega _{-i}\), the excitation amplitudes \(a_{-i}\), and consequently, the individual control laws \(U_{-i}(t)\) are not available to the Player i. Recalling that the model of the payoffs (1) are also assumed to be unknown, it becomes impossible to reconstruct individually or estimate completely the Hessian matrix H given in (7) by using demodulating signals such as in (33). Hence, the predictor design presented in Sect. 4 must be reformulated for noncooperative games.

However, it is still possible to design Nash equilibrium seeking control laws for a class of weakly coupled noncooperative games, as shown in the following. For such a class of games, we consider \(\epsilon _{ii}^{i}=1\) and \(\epsilon _{jk}^{i}=\epsilon _{kj}^{i}=\epsilon \), \(\forall j\ne k\), such that \(0<\epsilon <1\) in (1) and (7).

Following the non-sharing information paradigm, the ith-player is only able to estimate the element \(H_{ii}^{i}\) of the H matrix (7) by itself, and this being so for all players. Therefore, only the diagonal of H can be properly recovered in the average sense. In this way, the signal \(N_{i}(t)\) is now simply defined by:

$$\begin{aligned} N_{i}(t){:=} N_{ii}(t)= \dfrac{16}{a^2_i} \bigg ( \sin ^2 (\omega _i t) - \dfrac{1}{2} \bigg ), \end{aligned}$$
(99)

according to (33). Then, the average version of

$$\begin{aligned} {\hat{H}}_{i}(t)= N_i(t) y_i(t) \end{aligned}$$
(100)

is given by

$$\begin{aligned} {\hat{H}}_{i}^{\mathrm{{av}}}(t)=\left[ N_{i}(t)y_{i}(t)\right] _\mathrm{{av}}=H_{ii}^{i}. \end{aligned}$$
(101)

In order to compensate the time-delays, we propose the following predictor-based update law:

$$\begin{aligned}&\dot{\hat{\theta }}_{i}(t) = U_{i}(t), \end{aligned}$$
(102)
$$\begin{aligned}&{\dot{U}}_{i}(t) = -\,c_{i}U_{i}(t) + c_{i}k_{i} \bigg ({\hat{G}}_{i}(t) +{\hat{H}}_{i}(t) \int _{t-D_i}^{t} U_i(\tau )\hbox {d}\tau \bigg ), \end{aligned}$$
(103)

for positive constants \(k_{i}\) and \(c_{i}\).

5.2 ISS-Like Properties for PDE Representation

For the sake of simplicity, we assume \(c_{i} \rightarrow +\infty \) in (103), resulting in the following expression:

$$\begin{aligned} U_{i}(t)=k_{i}\left( {\hat{G}}_{i}(t) +{\hat{H}}_{i}(t) \int _{t-D_i}^{t} U_i(\tau )\hbox {d}\tau \right) , \end{aligned}$$
(104)

such that the delayed closed-loop system (27) and (104) in its average version can be written in the corresponding PDE representation form, given by

$$\begin{aligned}&\dot{{\hat{G}}}_{i}^{\mathrm{{av}}}(t) = \sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}u_{j}^{\mathrm{{av}}}(0,t), \end{aligned}$$
(105)
$$\begin{aligned}&\partial _t u_i^{\mathrm{{av}}}(x,t)=\partial _x u_i^{\mathrm{{av}}}(x,t), \quad x\in ~ {]0,D_i[}, \quad i=1, 2, \ldots ,N, \end{aligned}$$
(106)
$$\begin{aligned}&u_i^{\mathrm{{av}}}(D_i,t)=U_i^{\mathrm{{av}}}(t) . \end{aligned}$$
(107)

The relation between \(u_i^{\mathrm{{av}}}\) and \(U_i^{\mathrm{{av}}}\) is given by \(u_i^{\mathrm{{av}}}(x,t)=U_i^{\mathrm{{av}}}(x+t-D_i)\).

Analogous to the developments carried out in Sect. 4.2, we use the transformation [37]

$$\begin{aligned} {\bar{G}}_{i}^{\mathrm{{av}}}(t)= & {} {\hat{G}}_{i}^{\mathrm{{av}}}(t)+ \sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau \nonumber \\= & {} {\hat{G}}_{i}^{\mathrm{{av}}}(t) + \sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\int _{0}^{D_j} u_j^{\mathrm{{av}}}(\xi ,t) \hbox {d}\xi . \end{aligned}$$
(108)

Taking the time-derivative of (108), we obtain

$$\begin{aligned} \dot{{\bar{G}}}_{i}^{\mathrm{{av}}}(t)= & {} \dot{{\hat{G}}}_{i}^{\mathrm{{av}}}(t)+ \sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\int _{0}^{D_j} \partial _{t}u_j^{\mathrm{{av}}}(\xi ,t) \hbox {d}\xi . \end{aligned}$$
(109)

Then, by plugging (105) and (106) into (109), one has

$$\begin{aligned} \dot{{\bar{G}}}_{i}^{\mathrm{{av}}}(t)&=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}u_{j}^{\mathrm{{av}}}(0,t)+ \sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\int _{0}^{D_j} \partial _{x}u_j^{\mathrm{{av}}}(\xi ,t) \hbox {d}\xi \nonumber \\&=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}u_{j}^{\mathrm{{av}}}(0,t)+ \sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\left[ u_j^{\mathrm{{av}}}(D_{j},t)- u_j^{\mathrm{{av}}}(0,t)\right] \nonumber \\&=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}u_j^{\mathrm{{av}}}(D_{j},t). \end{aligned}$$
(110)

Substituting (107) in (110), it is not difficult to see that \({\bar{G}}_{i}^{\mathrm{{av}}}\) satisfies

$$\begin{aligned} \dot{{\bar{G}}}_{i}^{\mathrm{{av}}}(t)=\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i} U_j^{\mathrm{{av}}}(t). \end{aligned}$$
(111)

Now, after adding and subtracting the next terms outside the parentheses into (104), \(U_i(t)\) can be rewritten as:

(112)

whose average is

$$\begin{aligned} U_{i}^{\mathrm{{av}}}(t)&=k_{i}\left( {\hat{G}}_{i}^{\mathrm{{av}}}(t) +{\hat{H}}_{i}^{\mathrm{{av}}}(t) \int _{t-D_i}^{t} U_i^{\mathrm{{av}}}(\tau )\hbox {d}\tau +\sum _{j\ne i}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau \right) + \nonumber \\&\quad -k_{i}\sum _{j\ne i}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau \nonumber \\&=k_{i}\left( {\hat{G}}_{i}^{\mathrm{{av}}}(t) +\epsilon _{ii}^{i}H_{ii}^{i} \int _{t-D_i}^{t} U_i^{\mathrm{{av}}}(\tau )\hbox {d}\tau +\sum _{j\ne i}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau \right) + \nonumber \\&\quad -k_{i}\sum _{j\ne i}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau \nonumber \\&=k_{i}\left( {\hat{G}}_{i}^{\mathrm{{av}}}(t) +\sum _{j=1}^{N}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau \right) + \nonumber \\&\quad -k_{i}\sum _{j\ne i}\epsilon _{ij}^{i}H_{ij}^{i}\int _{t-D_j}^{t} U_j^{\mathrm{{av}}}(\tau ) \hbox {d}\tau . \end{aligned}$$
(113)

By defining the auxiliary signals

$$\begin{aligned} \phi _{i}(D,t)&{:=}-\sum _{j\ne i}H_{ij}^{i}\int _{t-D_j}^{t} U_j(\tau ) \hbox {d}\tau , \nonumber \\ \phi _{i}(1,t)&{:=}-\sum _{j\ne i}H_{ij}^{i}\int _{0}^{1} D_ju_j(\xi ,t) \hbox {d}\xi , \end{aligned}$$
(114)

and recalling that \(\epsilon _{jk}^{i}=\epsilon \), \(\forall j\ne k\), Eq. (113) can be rewritten from (108) as

$$\begin{aligned} U_{i}^{\mathrm{{av}}}(t)&=k_{i}{\bar{G}}_{i}^{\mathrm{{av}}}(t)+\epsilon k_{i}\phi _{i}^{\mathrm{{av}}}(D,t). \end{aligned}$$
(115)

Taking into account all players and defining \({\bar{G}}(t) {:=} [{\bar{G}}_{1}(t),\ldots ,{\bar{G}}_{N}(t)]^{T} \in {\mathbb {R}}^{N}\), \(U(t) {:=} [U_{1}(t),\ldots ,U_{N}(t)]^{T} \in {\mathbb {R}}^{N}\) and \(\phi (D,t) {:=} [\phi _{1}(D,t),\ldots ,\phi _{N}(D,t)]^{T} \in {\mathbb {R}}^{N}\), it is possible to find a compact form for the overall average game from Eqs. (111) and (115) such as

$$\begin{aligned}&\dot{{\bar{G}}}^{\mathrm{{av}}}(t) = HK {\bar{G}}^{\mathrm{{av}}}(t)+\epsilon HK \phi ^{\mathrm{{av}}}(1,t), \end{aligned}$$
(116)
$$\begin{aligned}&\partial _t u^{\mathrm{{av}}}(x,t)=D^{-1}\partial _x u^{\mathrm{{av}}}(x,t), \quad x\in ~ {]0,1[}, \end{aligned}$$
(117)
$$\begin{aligned}&u^{\mathrm{{av}}}(1,t)=K {\bar{G}}^{\mathrm{{av}}}(t)+\epsilon K \phi ^{\mathrm{{av}}}(1,t) , \end{aligned}$$
(118)

\(K = \text {diag}\left\{ k_1,\ldots , k_N\right\} \) being a positive-definite diagonal matrix, with entries \(k_i > 0\).

From (116), it is clear that the dynamics of the ODE state variable \({\bar{G}}^{\mathrm{{av}}}(t)\) is exponentially ISS [31] with respect to the PDE state u(xt) by means of the function \(\phi ^{\mathrm{{av}}}(1,t)\). Moreover, the PDE subsystem (117) is ISS (finite-time stable) [31] with respect to \({\bar{G}}^{\mathrm{{av}}}(t)\) in the boundary condition \(u^{\mathrm{{av}}}(1,t)\).

5.3 Stability Analysis

In this sub-section, we will show that this hyperbolic PDE-ODE loop (116)–(118) contains a small-parameter \(\epsilon \) which can lead to closed-loop stability if it is chosen sufficiently small. To this end, in addition to Assumption 1 formulated in Sect. 4.3, we further assume/formalize the following condition for noncooperative games.

Assumption 2

The parameters \(\epsilon _{jk}^{i}\) and \(\epsilon _{kj}^{i}\) which appear in the Hessian matrix H given by (7) satisfy the conditions below:

$$\begin{aligned} \epsilon _{ii}^{i}=1, \quad \epsilon _{jk}^{i}= & {} \epsilon _{kj}^{i}=\epsilon , \quad \forall j\ne k, \end{aligned}$$
(119)

with \(0<\epsilon <1\) in the payoff functions (1).

Assumption 2 could be relaxed to consider different values of the coupling parameters \(\epsilon _i\) for each Player i. However, without loss of generality, we have assumed the same weights for the interconnection channels among the players in order to facilitate the proof of the next theorem, but also to guarantee that the considered noncooperative game is not favoring any specific player. To attain \(\theta ^*\) stably in real time, without any model information (except for the delays \(D_i\)), each Player i employs the noncooperative extremum seeking strategy (103) via predictor feedback.

The next theorem provides the stability/convergence properties of the closed-loop extremum seeking feedback for the N-player noncooperative game with delays and non-sharing of information.

Theorem 5.1

Consider the closed-loop system (105)–(107) under Assumptions 1 and 2 and multiple and distinct input delays \(D_i\) for the N-player quadratic noncooperative game with no information sharing, with payoff functions given in (1) and control laws \(U_i(t)\) defined in (103). There exist \(c>0\) and \(\omega >0\) sufficiently large as well as \(\epsilon >0\) sufficiently small such that closed-loop system with state \({\tilde{\theta }}_i(t-D_i)\), \(U_i(\tau )\), \(\forall \tau \in [t-D_i,t]\) and \(\forall i\in {1,2,\ldots , N}\), has a unique locally exponentially stable periodic solution in t of period \(\varPi \), denoted by \({\tilde{\theta }}_i^{\varPi }(t-D_i)\), \(U_i^{\varPi }(\tau )\), \(\forall \tau \in [t-D_i,t]\) satisfying, \(\forall t\ge 0\):

$$\begin{aligned} \left( \sum _{i=1}^{N}\left[ {\tilde{\theta }}_{i}^{\varPi }(t-D_i)\right] ^2 + \int _{t-D_i}^{t} \left[ U_{i}^{\varPi }(\tau )\right] ^2\hbox {d}\tau \right) ^{1/2} \le {\mathcal {O}}(1/\omega ). \end{aligned}$$
(120)

Furthermore,

$$\begin{aligned} \limsup _{t\rightarrow +\infty }|\theta (t)-\theta ^*|= & {} {\mathcal {O}}(|a|+1/\omega ), \end{aligned}$$
(121)

where \(a=[a_1 \ a_2 \ \cdots \ a_N]^T\) and \(\theta ^*\) is the unique Nash equilibrium given by (8).

Proof

The proof of Theorem 5.1 follows steps similar to those employed to prove Theorem 4.1. In this sense, we will simply point out the main differences, instead of giving the full independent proof.

While in Theorem 4.1 it was possible to prove the local exponential stability of the average closed-loop system using a Lyapunov functional (76), a different approach is adopted here for the noncooperative scenario. We will show that it is possible to guarantee the local exponential stability for the average closed-loop system (116)–(118) by means of a small-gain analysis.

First, consider the equivalent hyperbolic PDE-ODE representation (116)–(118) rewritten for each Player i:

$$\begin{aligned}&\dot{{\bar{G}}}^{\mathrm{{av}}}_i(t) = H_{ii}^{i}k_i {\bar{G}}^{\mathrm{{av}}}_i(t)+\epsilon H_{ii}^{i}k_i \phi _i^{\mathrm{{av}}}(1,t), \end{aligned}$$
(122)
$$\begin{aligned}&\partial _t u^{\mathrm{{av}}}_i(x,t)=D_i^{-1}\partial _x u^{\mathrm{{av}}}_i(x,t), \quad x\in ~ {]0,1[}, \end{aligned}$$
(123)
$$\begin{aligned}&u^{\mathrm{{av}}}_i(1,t)= k_i {\bar{G}}^{\mathrm{{av}}}_i(t)+\epsilon k_i \phi ^{\mathrm{{av}}}_i(1,t) , \end{aligned}$$
(124)

where \(H_{ii}^{i} < 0\), \(k_i > 0\), \(0< \epsilon < 1\) and \(D_i^{-1} > 0\). The average closed-loop system(122)–(124) satisfies both Assumptions (H1) and (H2) of the small-gain theorem [31, Theorem 8.1, p. 198] for the hyperbolic PDE-ODE loops (see also Theorem A.2 in the Appendix) with \(n=N\), \(c=D_i^{-1}\), \(F({\bar{G}}^{\mathrm{{av}}}_i,u^\mathrm{{av}}, 0)=H_{ii}^{i}k_i {\bar{G}}^{\mathrm{{av}}}_i+\epsilon H_{ii}^{i}k_i \phi _i^{\mathrm{{av}}}(1)\), \(a(x)=f(x\,,t)=g(x\,,\bar{G}^\mathrm{{av}}_i\,,u^\mathrm{{av}})=0\), \(\varphi (0,u^\mathrm{{av}},{\bar{G}}^{\mathrm{{av}}}_i)=k_i {\bar{G}}^{\mathrm{{av}}}_i+\epsilon k_i \phi ^{\mathrm{{av}}}_i(1)\), \(\bar{N}=\max (k_i\,,\epsilon k_ik_HD_N)\), \(L=\max (|H_{ii}^{i}|k_i, \epsilon |H_{ii}^{i}|k_{i}k_{H}D_N)\), \(\gamma _2=k_i\), \(A=\gamma _1=0\), \(B=\epsilon k_ik_HD_N\), \(b_2=0\) for each \(i \in \{1\,, \ldots \,, N\}\).. Assumption (H1) holds with \(M=1\), \(\gamma _3=\epsilon |H_{ii}^{i}| k_i k_{H} D_N\), \(\sigma =|H_{ii}^{i}|k_i\) as it can be readily verified by means of the variations of constants formula

$$\begin{aligned} {\bar{G}}^{\mathrm{{av}}}_i(t) = \exp (-|H_{ii}^{i}|k_i t){\bar{G}}^{\mathrm{{av}}}_i(0) + \int _{0}^{1} \exp (-|H_{ii}^{i}|k_i(t+s))\epsilon H_{ii}^{i}k_i \phi _i^{\mathrm{{av}}}(1,s)\hbox {d}s, \end{aligned}$$

and from the application of the Cauchy–Schwarz inequality to the term \(\phi ^{\mathrm{{av}}}(1,t)\) in Eq. (114):

$$\begin{aligned} \phi _i^{\mathrm{{av}}}(1,t)\le & {} \sum _{j\ne i}|H_{ij}^{i}| \left( \int _{0}^{1} D_j^2 d\tau \right) ^{\frac{1}{2}}\times \left( \int _{0}^{1} [u^{\mathrm{{av}}}_j(\xi ,t)]^{2} \hbox {d}\xi \right) ^{\frac{1}{2}} \nonumber \\\le & {} k_{H} D_N \left( \int _{0}^{1} u^\mathrm{{av}}(\xi ,t)^T u^\mathrm{{av}}(\xi ,t) d\xi \right) ^{\frac{1}{2}}, \end{aligned}$$
(125)

since \(\sum _{j\ne i}^{N}|H_{ij}^{i}|< k_{H} < \frac{1}{\epsilon }|H_{ii}^{i}|\), where \(k_{H}\) is a positive constant of order \({\mathcal {O}}(1)\), according to Assumptions 1 and 2.

It follows that the small-gain condition in Theorem A.2 in the Appendix holds provided \(0<\epsilon <1\) is sufficiently small. Therefore, if such a small-gain condition holds, then Theorem A.2 (see the Appendix) allows us to conclude that there exist constants \(\delta ,\varTheta >0\) such that for every \(u_0^{\mathrm{{av}}} \in C^{0}([0,1])\), \(\bar{G}^\mathrm{{av}}_{0} \in \mathbb {R}^{N}\), the unique generalized solution of this initial-boundary value problem, with \(u^{\mathrm{{av}}}(x,0)=u^{\mathrm{{av}}}_0\) and \(\bar{G}^\mathrm{{av}}(0)=\bar{G}^\mathrm{{av}}_{0}\), satisfies the following estimate:

$$\begin{aligned} |\bar{G}^\mathrm{{av}}(t)|+\Vert u^{\mathrm{{av}}}(t)\Vert _{\infty }\le \varTheta (|\bar{G}^\mathrm{{av}}_{0}|+\Vert u^{\mathrm{{av}}}_0\Vert _{\infty })\exp (-\delta t), \quad \forall t \ge 0. \end{aligned}$$
(126)

Therefore, we conclude that the origin of the average closed-loop system (116)–(118) is exponentially stable under the assumption of \(0< \epsilon < 1\) being sufficiently small. Then, from (28) and (108), we can conclude the same results in the norm

$$\begin{aligned} \left( \sum _{i=1}^{N}\left[ {\tilde{\theta }}_{i}^{\mathrm{{av}}}(t-D_i)\right] ^2 + \int _{0}^{D_i} \left[ u_{i}^{\mathrm{{av}}}(\tau )\right] ^2\hbox {d}\tau \right) ^{1/2} \end{aligned}$$
(127)

since H is non-singular, i.e., \(|{\tilde{\theta }}_{i}^{\mathrm{{av}}}(t-D_i)|\le |H^{-1}||{\hat{G}}^{\mathrm{{av}}}(t)|\).

As developed in the proof of Theorem 4.1, the next steps to complete the proof would be the application of the local averaging theory for infinite-dimensional systems in [32] showing that the periodic solutions indeed satisfy inequality (120) and then the conclusion of the attractiveness of the Nash equilibrium \(\theta ^*\) according to (121). \(\square \)

6 Trade-Off Between Measurement Requirements and System Restrictions in the Cooperative and Noncooperative Nash Equilibrium Seeking Approaches

In the cooperative approach, there is a kind of information sharing that collectively facilitates the elaboration of the control law implemented by each player. In the context of extremum seeking, such information are the frequencies \(\{\omega _{1},\ldots ,\omega _{N}\}\) and the amplitudes \(\{a_{1},\ldots ,a_{N}\}\) of the dither signals as well as the players’ actions \(\{U_{1},\ldots ,U_{N}\}\) and the delays \(\{D_{1},\ldots ,D_{N}\}\), which are known to all players in the prediction process. From this perspective, through appropriate signals S(t), M(t) and N(t), each ith player is able to individually estimate (on average) the portion of the advanced gradient of his payoff function, i.e., \({\hat{G}}_{i}^{\mathrm{{av}}}(t+D_{i})\), and the Hessian matrix H in (7) by employing (55) and (56). Note that Eq. (38) can be interpreted as a filtering version of the advanced estimate \({\hat{G}}_{i}^{\mathrm{{av}}}(t+D_{i})\). In other words, by assuming \(c_i=c\) to be sufficiently large (\(c\rightarrow +\infty \)) in (38) and using (41), it is possible to rewrite (42) in the following vector form:

$$\begin{aligned}&U(t)=K{\hat{G}}(t+D), \quad K=\text {diag}(k_1 \cdots k_N), \quad k_i>0,\nonumber \\&{\hat{G}}(t+D)={\hat{G}}(t)+{\hat{H}}(t)\sum _{j=1}^{N}e_j \int _{t-D_j}^{t} U_j(\tau )\hbox {d}\tau , \end{aligned}$$
(128)

where \(e_j \in {\mathbb {R}}^N\) stands for the ith column of the identity matrix \(I_N \in {\mathbb {R}}^{N \times N}\) for each \(j \in \{1, 2,\ldots , N\}\), and \({\hat{G}}(t)\), \({\hat{H}}(t)\) are given in (64) and (65), respectively. Therefore, in the cooperative approach, the control law is able to predict totally the gradient such that the average closed-loop equation in (29) is rewritten by

$$\begin{aligned} \dot{{\hat{G}}}^{\text {av}}(t)&=HU^{\text {av}}(t-D)=HK{\hat{G}}^{\text {av}}(t), \end{aligned}$$

with \(U^{\text {av}}(t-D)= \begin{bmatrix} U^{\text {av}}_1(t-D_1),&U^{\text {av}}_2(t-D_2),&\ldots&, U^{\text {av}}_N(t-D_N) \end{bmatrix}^T\). Hence, for HK Hurwitz, we can conclude \({\hat{G}}^{\text {av}}(t) \rightarrow 0\). Moreover, from (28) and the assumption of H being invertible, one has \({\tilde{\theta }}^{\text {av}}(t) \rightarrow 0\). Consequently, \({\hat{\theta }}^{\text {av}}(t) \rightarrow \theta ^{*}\), i.e., the Nash equilibrium is reached at least asymptotically, as shown in Theorem 4.1.

In the noncooperative scenario, there is no sharing of information among the players and the construction of an estimate of the advanced gradient \({\hat{G}}(t+D)\) as in (41) cannot be conceived as described before for the cooperative case. On the other hand, the control law (103) is designed using only information from the i-th player. In this case, the closed-loop system is rigorously analyzed under a new perspective, by exploring the small gains in the interconnections of the PDE-ODE cascades. In practice, the control law (103) differs from (38) by estimating only the diagonal terms of the matrix \({\hat{H}}(t)\) in (65), such that (104) can be written as

$$\begin{aligned} U(t)&=K\underbrace{\left( {\hat{G}}(t)+\text {diag}\left\{ {\hat{H}}(t)\right\} \sum \nolimits _{j=1}^{N}e_j \int _{t-D_j}^{t} U_j(\tau )\hbox {d}\tau \right) }_{\ne ~{\hat{G}}(t+D) ~\text {in eq. (128)}}. \end{aligned}$$

Thus, unlike (41), there is no classical prediction a priori of \({\hat{G}}(t+D)\) in the noncooperative scheme. However, after some mathematical manipulations carried out in Sect. 5.2, it is possible to rewrite (112) in the vector form:

$$\begin{aligned} U(t)&=K{\bar{G}}(t)+\epsilon K \phi (D,t), \\ {\bar{G}}(t)&={\hat{G}}(t)+\left[ \text {diag}\left\{ {\hat{H}}(t)\right\} +H-\text {diag}\left\{ H\right\} \right] \sum _{j=1}^{N}e_j \int _{t-D_j}^{t} U_j(\tau )\hbox {d}\tau , \end{aligned}$$

where \({\bar{G}}(t) {:=} [{\bar{G}}_{1}(t),\ldots ,{\bar{G}}_{N}(t)]^{T} \in {\mathbb {R}}^{N}\) and \(\phi (D,t) {:=} [\phi _{1}(D,t),\ldots ,\phi _{N}(D,t)]^{T} \in {\mathbb {R}}^{N}\), with \(\bar{G}_{i}^{\text {av}}(t)\) and \(\phi _i(D,t)\) defined in (108) and (114), respectively. Now, it is easy to verify that even with \({\bar{G}}(t)\ne {\hat{G}}(t+D)\), the average estimate is

$$\begin{aligned} {\bar{G}}^{\text {av}}(t)&={\hat{G}}^{\text {av}}(t+D)={\hat{G}}^{\text {av}}(t)+H\sum _{j=1}^{N}e_j \int _{t-D_j}^{t} U^{\text {av}}_j(\tau )\hbox {d}\tau , \end{aligned}$$

since \(\left[ \text {diag}\left\{ {\hat{H}}(t)\right\} \right] _\mathrm{{av}}=\text {diag}\left\{ H\right\} \) according to (7) and (101). Hence,

$$\begin{aligned} U^{\text {av}}(t)&=K{\bar{G}}^{\text {av}}(t)+\epsilon K \phi ^{\text {av}}(D,t), \nonumber \\&=K{\hat{G}}^{\text {av}}(t+D)+\epsilon K \phi ^{\text {av}}(D,t). \end{aligned}$$
(129)

Finally, by plugging (129) into (29), we obtain

$$\begin{aligned} \dot{{\hat{G}}}^{\text {av}}(t)&=HU^{\text {av}}(t-D)=HK{\hat{G}}^{\text {av}}(t)+\epsilon HK \phi ^{\text {av}}(D,t-D). \end{aligned}$$

Under the condition of HK being Hurwitz (with H being invertible), the small-gain analysis performed in the proof of Theorem 5.1 guarantees that the Nash equilibrium can be reached if \(\epsilon >0\) is sufficiently small, even in this more restrictive scenario of measurement requirements and system restrictions. Additionally, in the limiting case where \(\epsilon = 0\) both cooperative and noncooperative Nash seeking strategies become equivalent.

7 Simulations with a Two-Player Game Under Constant Delays

For an example of a noncooperative game with 2 players that employ the proposed extremum seeking strategy for delay compensation, we consider the following payoff functions (1) subject to distinct delays \(D_1=20\) and \(D_2=15\) in the players’ decisions, \(i\in \{1,2\}\):

$$\begin{aligned} J_1(\theta (t-D))&=-5~\theta _1^2(t-D_1)+5~\epsilon \theta _1(t-D_1)\theta _2(t-D_2)+\nonumber \\&\quad +250~\theta _1(t-D_1)-150~\theta _2(t-D_2)-3000, \end{aligned}$$
(130)
$$\begin{aligned} J_2(\theta (t-D))&=-5~\theta _2^2(t-D_2)+5~\epsilon \theta _1(t-D_1)\theta _2(t-D_2)+\nonumber \\&\quad -150~\theta _1(t-D_1)+150~\theta _2(t-D_2)+6000, \end{aligned}$$
(131)

which, according to (8), yield the unique Nash equilibrium

$$\begin{aligned} \theta _{1}^{*}&=\frac{100+30\epsilon }{4-\epsilon ^{2}}, \end{aligned}$$
(132)
$$\begin{aligned} \theta _{2}^{*}&=\frac{60+50\epsilon }{4-\epsilon ^{2}}. \end{aligned}$$
(133)

In order to attain the Nash equilibrium (132) and (133) without any knowledge of modeling information, the players implement a non-model-based real-time optimization strategy, e.g., the proposed deterministic extremum seeking with sinusoidal perturbations based on prediction of Sect. 5, to set their best actions despite of the delays (see Fig. 3). Specifically, the players \(P_1\) and \(P_2\) set their actions, \(\theta _1\) and \(\theta _2\) according to (15) with adaptation laws \({\hat{\theta }}_{i}(t)\) in (102) and (103), respectively. The gradient estimates \({\hat{G}}_{i}\) for each player and the diagonal estimate \({\hat{H}}_{i}\) of the Hessian are given respectively in (19) and (100), where the dither signals \(S_{i}(t)\), \(M_{i}(t)\) and \(N_{i}(t)\) are presented in (10), (11) and (99).

Fig. 3
figure 3

Nash equilibrium seeking schemes with delay compensation for a two-player noncooperative game

For comparison purposes, except for the delays, the plant and controller parameters were chosen as in [18] in all simulation tests: \(\epsilon =1\), \(a_1=0.075\), \(a_2=0.05\), \(k_1=2\), \(k_2=5\), \(\omega _1=26.75~\)rad/s, \(\omega _2=22~\)rad/s and \(\theta _1(0)={\hat{\theta }}_1(0)=50\), \(\theta _2(0)={\hat{\theta }}_2(0)=\theta _2^*=110/3\). In addition, the time-constants of the predictor filters were set to \(c_{1}=c_{2}=100\).

Fig. 4
figure 4

Delay free

Fig. 5
figure 5

Uncompensated delays

Fig. 6
figure 6

Compensated delays

Unlike other classical strategies for non-cooperative games [13] (free of delays), when using the proposed extremum seeking algorithm, the players only need to measure the value of their own payoff functions, \(J_1\) and \(J_2\).

In Fig. 4a, b, we can check the extremum seeking approach proposed in [18] is effective when delays in the decision variables are not taken into account. The players find the Nash equilibrium in about 100 seconds. On the other hand, in the presence of delays \(D_1\) and \(D_2\) in the input signals \(\theta _1\) and \(\theta _2\), but without considering any kind of delay compensation, Fig. 5a, b show that the game collapses with the explosion of its variables. On the other hand, Fig. 6a, b show that the proposed predictor scheme fixes this with a remarkable evolution in searching the Nash equilibrium and simultaneously compensating the effect of the delays in our noncooperative game.

This first set of simulations indicates that even under an adversarial scenario of strong coupling between the players with \(\epsilon =1\), the proposed approach has behaved successfully. This suggests our stability analysis may be conservative and the theoretical assumption \(0< \epsilon < 1\) may be relaxed given the performance of the closed-loop control system. In Fig. 6c, d, different values \(\epsilon =0.5\) and \(\epsilon =0.1\) are considered in order to evaluate the robustness of the proposed scheme under different levels of coupling between the two players and the corresponding impact on the transient responses (the smaller coupling \(\epsilon \) is, the faster is the convergence rate).

In a nutshell, independently of the values used for \(\epsilon \), the simulation tests indicate that even with multiple and distinct delays in the player actions, both of them were able to optimize their payoff functions in the noncooperative game by seeking the desired Nash equilibrium.

8 Conclusions

We have introduced a non-model-based approach via extremum seeking and predictor feedback to compute in a distributed way for Nash equilibria of (non)cooperative games with N players with unknown quadratic payoff functions and time-delayed actions, under two different information sharing scenarios—cooperative and noncooperative. In the noncooperative scenario, a player can stably attain its Nash equilibrium by measuring only the value of its payoff function (no other information about the game is needed), while in the cooperative scenario the players must share part of the information about the Hessian matrix estimated by each of them. Local stability and convergence are guaranteed by means of averaging theory in infinite dimensions, Lyapunov functionals and/or a small-gain analysis for ODE-PDE loops. Numerical simulations conducted for a two-player game under constant delays support our theoretical results.

Such approaches have potential for many applications. One possibility is in electronic markets, where players negotiate prices in real time as the supply and demand fluctuates, such as households in a smart electric grid, and in addition time-delays appear naturally for different reasons or can be even artificially introduced to perturb the overall large-scale system.

For future research, it seems that the general formulation could have many possible options or directions, with various delays needed to be compensated. The players can have their inputs delayed and their measured payoffs delayed simultaneously. Extension to more general non-quadratic payoff functions would be of interest as well. Note also that for games, we only explored the gradient extremum seeking approach. It would be important to check if the Newton-based extremum seeking would also be viable or not for games. For a future investigation topic, it would be also interesting to see the proposed scheme or even different extremum seeking designs [39] facing games with other sort of infinite-dimensional PDE dynamics [40, 41] rather than pure delays.