1 Introduction

We show that an equilibrium of a random multi-period game involving a continuum of players can be used to achieve asymptotically equilibrium results for its large finite counterparts. The latter finite games can model competitive situations involving random and action-dependent evolution of players’ states which in turn influence period-wise payoffs. Their complex natures make equilibria difficult to locate. In contrast, those for the former continuum-player game are simple in form and relatively easy to obtain. Therefore, a bridge between the two types of games can have broad practical implications.

The former continuum-player game can be termed more formally as a sequential semi-anonymous nonatomic game (SSNG). In it, a continuum of players interact with one another in multiple periods; also, each player’s one-time payoff and random one-period state transition are both swayed by his own state and action, as well as the joint distribution of other players’ states and actions. This is indeed the anonymous sequential game studied by Jovanovic and Rosenthal (1988). We use the name SSNG just to be consistent with the single-period nonatomic-game (NG) literature, where anonymity has been reserved for a more special case. An SSNG’s finite counterpart is almost the same except that only a finite number of players are involved. This more realistic situation is much more difficult to handle.

In a few steps, we demonstrate the usefulness of an SSNG equilibrium in finite multi-period games. First, in a precise language, Theorem 1 describes the gradual retreat of randomness in finite games as the number n of players tends to \(+\infty \). This paves way for Theorem 2, which states that an SSNG’s conditional equilibria, in terms of random state-to-action rules, can be used by its large finite counterparts to reach asymptotically equilibrium payoffs on average. A further refinement of this result is achieved in Theorem 3. The above transient results can be extended to the stationary case involving discounted payoffs and infinite time horizons; see Theorem 4. The conditional equilibria that facilitate our study are similar to well-understood distributional equilibria. Their existence is also directly verifiable.

One practical situation to which our results can be applied concerns dynamic price competition. Here, players may be firms producing one identical product type, states may be combinations of the firms’ inventory levels and other static or dynamic characteristics such as unit costs, and actions may be unit prices the firms charge for the product. In every period, the random demand arriving to a firm is dependent on not only its own price but also prices charged by other competitors. The actual sales is further constrained by available inventory. So the player’s one-time payoff is a function of both its own state (inventory and probably also cost) and action (price) and of the distribution of others’ actions (prices). Moreover, the firm’s next-period inventory level depends on its current level, the random demand, and potentially an exogenously given production schedule. So the random single-period state transition is potentially a function of the same factors involved in the payoff.

It is a difficult task to predict or prescribe what inventory-dependent prices the firms will or should charge over a finite time horizon. This can be further complicated by diverse scenarios where firms have different degrees of knowledge on their competitors’ inventory levels and/or costs. Our results, on the other hand, will reveal that the nonatomic counterpart SSNG is easier to tackle. Its equilibria can be plugged back to the actual finite-player situations, without regard to the particularities of the scenarios, and still make reasonably good predictions/prescriptions when the number of players is large enough. We are not equipped to answer how large is “large enough”. But computational study done in a related pricing setting hinted that player numbers “in the tens” seem large enough; see, Yang and Xia (2013).

In the remainder of the paper, Sect. 2 surveys the relevant literature. We then spend Sects. 3 and 4 on essentials of SSNGs and finite games, respectively. In Sect. 5, we demonstrate the key result that state evolutions in large finite games will not veer too far away from their NG counterparts. Section 6 is devoted to the main transient result and Sect. 7 its detailed interpretation. This result is extended to the stationary case in Sect. 8. Implications of these results and existence of our kind of equilibria for SSNGs are shown in Sect. 9. We conclude the paper in Sect. 10.

2 Literature survey

From early on, NGs have been used as easier-to-analyze proxies of real finite-player situations, such as in the study of perfect competition. Systematic research on NG started with Schmeidler (1973). He formulated a single-period semi-anonymous NG, wherein the joint distribution of other players’ identities and actions may affect any given player’s payoff. When the action space is finite, Schmeidler established the existence of pure equilibria when the game becomes anonymous, so that other players’ influence on a game’s outcome is channeled through the marginal action distribution alone. Mas-Colell (1984) showed the existence of distributional equilibria in anonymous NGs with compact metric action spaces. The latter result was extended by Khan and Sun (1990) to a case where players differ on how their preferences over actions are influenced by external action distributions. A survey of related works up until the early 2000s was provided by Khan and Sun (2002).

Much attention has been paid to the topic of pure-equilibrium existence. Khan and Sun (1995) developed a purification scheme involving a countable compact metric action space. Khan and Sun (1999) used non-standard measures on identity spaces and generalized Schmeidler’s pure-equilibrium existence result for more general action spaces. Balder (2002) established pure- and mixed-equilibrium existence results that may be regarded as generalizations of Schmeidler’s corresponding results. Other notable works still include Yu and Zhang (2007) and Balder (2008). On the other hand, Khan et al. (1997) identified a certain limit to which Schmeidler’s result can be extended. Recently, Khan et al. (2013) took players’ diverse bio-social traits into consideration and pinpointed saturation of the player-identity distribution as the key to existence of pure equilibria.

Links between NGs and their finite counterparts were covered in Green (1984), Housman (1988), Carmona (2004), Kalai (2004), Al-Najjar (2008), and Yang (2011). For multi-period games without changing states, Green (1980), Sabourian (1990), and Al-Najjar and Smorodinsky (2001) showed that equilibria for large games are nearly myopic.

SSNGs are both challenging and rewarding to analyze because in them, very realistically, individual states are subject to sways of players’ own actions as well as their opponents’ states and actions. Jovanovic and Rosenthal (1988) established the existence of distributional equilibria for such games. This result was generalized by Bergin and Bernhardt (1995) to cases involving aggregate shocks. In SSNGs’ finite-player counterparts, however, randomness in state-distribution evolution will not go away. Besides, a player’s ability to observe other players’ states and actions might also affect his decision. Presented with these difficulties, it is not surprising that known results on sequential finite-player games are restricted to the stationary setting, where they appear as discounted stochastic games first introduced by Shapley (1953). According to Mertens and Parthasarathy (1987), Duffie et al. (1994), and Solan (1998), for instance, equilibria known to exist for these games come in quite complicated forms that for real implementation, demand a high degree of coordination among players.

It is therefore natural to ask whether sequential finite-player games can be approximated by their NG counterparts. This question has so far been answered by two unpublished articles. For a case unconcerned with the copula between marginal state and action distributions, Bodoh-Creed (2012) provided an affirmative answer, and went on to show for certain cases that limits of large-game equilibria of a myopic form, when in existence, are NG equilibria. Also, Yang (2015) verified for the approximability when both state transitions and action plans are driven by exogenously generated idiosyncratic shocks. Our current study attempts with the most general possible setting, without unduly restricting ways in which a player’s payoff can be influenced by other players’ states and actions or ways in which the game can evolve randomly. To achieve results of the same spirit, we have to overcome technical challenges posed by the new phenomenon of sampling from non-product joint probabilities.

Some authors went on to pursue stationary equilibria (SE), which stressed the long-run steady-state nature of individual action plans and system-wide multi-states; see, e.g., Hopenhayn (1992) and Adlakha and Johari (2013). The oblivious equilibrium (OE) concept as proposed by Weintraub et al. (2008), in order to account for impacts of large players, took the same stationary approach by letting participants beware of only long-run average system states. Weintraub et al. (2011) showed links between equilibria of infinite-player games and their finite-player brethren for a setting where the long-run average system state could be defined. Though applicable to many situations, we caution that the implicit stationarity of SE or OE is incompatible with applications that are transient by nature; for instance, the dynamic pricing game mentioned in Sect. 1.

3 The nonatomic game

The SSNG is a game in which a continuum of players interact with one another over multiple periods. A realistic and yet complicating feature is that players possess individual states which influence their payoffs along with all players’ actions. The random evolutions of these states, meanwhile, are affected by players’ actions. Furthermore, the semi-anonymous nature of the game means that not only what was done, but also who did what to the extent at which states partially reveal player identities, figure large in both payoff formation and state evolution. We now provide a detailed account of the game.

3.1 Game primitives

For some natural number \(\bar{t}\in \mathbb {N}\), we let periods \(1,2,\ldots ,\bar{t}\) serve as regular periods and period \(\bar{t}+1\) as the terminal period. For all periods, we let players’ individual states and actions form, respectively, separable metric spaces S and X. We further require that both spaces be discrete. In this paper, such a space always stands for a separable metric space with countably many elements and the additional feature that the minimum of the distances between any two points remains strictly positive. The discreteness requirement will be useful at one occasion. But most of our derivations will work if the spaces were merely separable metric. Given any separable metric space A, we use \(\mathcal{B}(A)\) for its Borel \(\sigma \)-field and \(\mathcal{P}(A)\) for the set of all probability measures on the measurable space \((A,\mathcal{B}(A))\).

To each player, other players’ states and actions are immediately felt in a semi-anonymous fashion, so that what really matters is the joint distribution of other players’ states and actions. This distribution, which we dub “in-action environment”, is a member of the joint state-action distribution space \(\mathcal{P}(S\times X)\). In any period \(t=1,2,\ldots ,\bar{t}\), a player’s state \(s\in S\), his action \(x\in X\), and the in-action environment \(\tau \in \mathcal{P}(S\times X)\) he faces, together determine his payoff in that period. In particular, there is a function

$$\begin{aligned} \tilde{f}_t:S\times X\times \mathcal{P}(S\times X)\rightarrow [-\bar{f}_t,\bar{f}_t], \end{aligned}$$
(1)

where \(\bar{f}_t\) is some positive constant on the real line \(\mathbb {R}\). It is required that \(\tilde{f}_t(\cdot ,\cdot ,\tau )\) be a measurable map from \(S\times X\) to \([-\bar{f}_t,\bar{f}_t]\) for every \(\tau \in \mathcal{P}(S\times X)\). For the terminal period \(\bar{t}+1\), we let the payoff be 0 in all circumstances.

Now we describe individual players’ random state transitions. Given separable metric spaces A and B, we use \(\mathcal{K}(A,B)\) to represent the space of all kernels from A to B. Each member \(\kappa \in \mathcal{K}(A,B)\subseteq (\mathcal{P}(B))^A\) satisfies that

  1. (i)

    \(\kappa (a)\) is a member of \(\mathcal{P}(B)\) for each \(a\in A\), and

  2. (ii)

    for each \(B'\in \mathcal{B}(B)\), the real-valued function \(\kappa (\cdot |B')\) is measurable.

Note that we have used \(\kappa (a|B')\) rather than the more conventional \(\kappa (B'|a)\) to denote the conditional probability for \(B'\in \mathcal{B}(B)\) when given \(a\in A\). The current notation allows us to always read a formula from left to right. Now in period \(1,2,\ldots ,\bar{t}\), let there be a function

$$\begin{aligned} \tilde{g}_t:S\times X\times \mathcal{P}(S\times X)\rightarrow \mathcal{P}(S), \end{aligned}$$
(2)

so that \(\tilde{g}_t(\cdot ,\cdot ,\tau )\) is a member of \(\mathcal{K}(S\times X,S)\) for each \(\tau \in \mathcal{P}(S\times X)\). For convenience, we use \(\mathcal{G}(S,X)\) to denote the space of all such functions, or what we shall call “state transition kernels”. In period t, when a player is in individual state \(s\in S\), takes action \(x\in X\), and faces in-action environment \(\tau \in \mathcal{P}(S\times X)\), there will be a \(\tilde{g}_t(s,x,\tau |S')\) chance for his state in period \(t+1\) to be in any \(S'\in \mathcal{B}(S)\).

This setup is versatile enough to embrace different player characteristics. For instance, each \(s\in S\) may comprise two components \(\theta \) and \(\omega \), with the \(\tilde{g}_t\)’s defined through (2) dictating that \(\theta \) stays static over time to serve as a player’s innate type. Certainly, the \(\tilde{f}_t\)’s defined through (1) can have all kinds of trends over \(\theta \) to reflect players’ varying payoff structures.

3.2 Evolution of the environments

In any period \(1,2,\ldots ,\bar{t},\bar{t}+1\), by “pre-action environment” we mean the state distribution \(\sigma \in \mathcal{P}(S)\) of all players. With \(\bar{t}\), S, X, \((\tilde{f}_t|t=1,2,\ldots ,\bar{t})\), and \((\tilde{g}_t|t=1,2,\ldots ,\bar{t})\) all given in the background, we use \(\Gamma (\sigma _1)\) to denote an (SS)NG with \(\sigma _1\in \mathcal{P}(S)\) as its initial period-1 pre-action environment. For this NG, we can use \(\chi _{[1\bar{t}]}=(\chi _t\mid t=1,\ldots ,\bar{t})\in (\mathcal{K}(S,X))^{\bar{t}}\) to denote a policy profile. Here, each \(\chi _t\in \mathcal{K}(S,X)\) is a map from a player’s state to the player’s random action choice. Together with the given initial environment \(\sigma _1\), this policy profile will help to generate a deterministic pre-action environment trajectory \(\sigma _{[1,\bar{t}+1]}=(\sigma _t\mid t=1,2,\ldots ,\bar{t},\bar{t}+1)\in (\mathcal{P}(S))^{\bar{t}+1}\) in an iterative fashion. This process is also intertwined with the formation of in-action environments \(\tau _1,\tau _2,\ldots ,\tau _{\bar{t}}\) faced by all players in periods \(1,2,\ldots ,\bar{t}\).

More notation is needed to precisely describe this evolution. Given distribution \(p\in \mathcal{P}(A)\) and kernel \(\kappa \in \mathcal{K}(A,B)\) for separable metric spaces A and B, there is a natural product \(p\otimes \kappa \in \mathcal{P}(A\times B)\), such that

$$\begin{aligned} (p\otimes \kappa )(A'\times B')=\int _{A'}p(da)\cdot \kappa (a|B'),\quad \forall A'\in \mathcal{B}(A),B'\in \mathcal{B}(B). \end{aligned}$$
(3)

Here, \(p\otimes \kappa \) is essentially the joint distribution generated by the marginal p and conditional distribution \(\kappa \). Obviously, \((p\otimes \kappa )|_A\), the marginal of \(p\otimes \kappa \) on A, is p. At the same time, we use \(p\odot \kappa \) to denote the marginal \((p\otimes \kappa )|_B\), which satisfies

$$\begin{aligned} (p\odot \kappa )(B')= & {} (p\otimes \kappa )|_B(B')=(p\otimes \kappa )(A\times B')\nonumber \\= & {} \int _A p(da)\cdot \kappa (a|B'),\quad \forall B'\in \mathcal{B}(B). \end{aligned}$$
(4)

Suppose pre-action environment \(\sigma _t\in \mathcal{P}(S)\) has been given for some period \(t=1,\ldots ,\bar{t}\). Then, for every player with starting state \(s_t\) in the period, his random action will be sampled from the distribution \(\chi _t(s_t|\cdot )\) where as noted before, \(\chi _t\in \mathcal{K}(S,X)\) is every player’s behavioral guide. Thus, all players will together form the commonly felt in-action environment

$$\begin{aligned} \tau _t=\sigma _t\otimes \chi _t. \end{aligned}$$
(5)

For each individual player with state \(s_t\) and realized action \(x_t\), his state \(s_{t+1}\) in period \(t+1\) will, by (2), be distributed according to \(\tilde{g}_t(s_t,x_t,\tau _t|\cdot )\). Thus, it will be reasonable for the pre-action environment in period \(t+1\) to follow \(\sigma _{t+1}=\tau _t\odot \tilde{g}_t(\cdot ,\cdot ,\tau _t)\), with

$$\begin{aligned}{}[\tau _t\odot \tilde{g}_t(\cdot ,\cdot ,\tau _t)](S')=\int _{S\times X}\tau _t(ds\times dx)\cdot \tilde{g}_t(s,x,\tau _t|S'),\quad \forall S'\in \mathcal{B}(S). \end{aligned}$$
(6)

Although (6) has been intuitively reasoned from (2), we caution that logically it is part of the NG’s definition rather than something derivable from the latter.

The transition from \(\sigma _t\) to \(\sigma _{t+1}\) through random action plan \(\chi _t\) is best expressed by an operator. For any kernel \(\chi \in \mathcal{K}(S,X)\), define operator \(T_t(\chi )\) on the space \(\mathcal{P}(S)\), so that

$$\begin{aligned} T_t(\chi )\circ \sigma =(\sigma \otimes \chi )\odot \tilde{g}_t(\cdot ,\cdot ,\sigma \otimes \chi )=\sigma \odot \chi \odot \tilde{g}_t(\cdot ,\cdot ,\sigma \otimes \chi ),\quad \forall \sigma \in \mathcal{P}(S).\nonumber \\ \end{aligned}$$
(7)

Basically, state distribution \(\sigma \) and random state-dependent action plan \(\chi \) first fuse to form the joint state-action distribution \(\sigma \otimes \chi \) to be felt by all players. The latter’s random state transitions are then guided by the kernel \(\tilde{g}_t(\cdot ,\cdot ,\sigma _t\otimes \chi )\). Subsequently, after “averaging out” impacts of actions, the next-period state distribution will become \(\sigma \odot \chi \odot \tilde{g}_t(\cdot ,\cdot ,\sigma \otimes \chi )\). The one-period pre-action environment transition is now representable by

$$\begin{aligned} \sigma _{t+1}=T_t(\chi _t)\circ \sigma _t=\sigma _t\odot \chi _t\odot \tilde{g}_t(\cdot ,\cdot ,\sigma _t\otimes \chi _t). \end{aligned}$$
(8)

For periods t and \(t'\) with \(t\le t'\), as well as sequence \(\chi _{[tt']}=(\chi _{t''}|t''=t,\ldots ,t')\) of action plans, we can iteratively define \(T_{[tt']}(\chi _{[tt']})\), so that

$$\begin{aligned} T_{[tt']}(\chi _{[tt']})\circ \sigma _t=T_{t'}(\chi _{t'})\circ (T_{[t,t'-1]}(\chi _{[t,t'-1]})\circ \sigma _t),\quad \forall \sigma _t\in \mathcal{P}(S). \end{aligned}$$
(9)

The left-hand side will be players’ state distribution in period \(t'+1\) when they start period t with the distribution \(\sigma _t\) and adopt the action sequence \(\chi _{[tt']}\) in the interim. Note that \(T_{[tt]}(\chi _{[tt]})\) is nothing but \(T_t(\chi _t)\). As a default, we let \(T_{[t,t-1]}\) stand for the identity operator on \(\mathcal{P}(S)\). The environment trajectory \(\sigma _{[1,\bar{t}+1]}\) satisfies

$$\begin{aligned} \sigma _{[1,\bar{t}+1]}=(T_{[1,t-1]}(\chi _{[1,t-1]})\circ \sigma _1\mid t=1,2,\ldots ,\bar{t},\bar{t}+1). \end{aligned}$$
(10)

It is deterministic by definition.

4 The n-player game

Let the same \(\bar{t}\), S, X, \((\tilde{f}_t|t=1,2,\ldots ,\bar{t})\), and \((\tilde{g}_t|t=1,2,\ldots ,\bar{t})\) remain in the background. For some \(n\in \mathbb {N}\setminus \{1\}\) and initial multi-state \(s_1=(s_{11},s_{12},\ldots ,s_{1n})\in S^n\), we can define an n-player game \(\Gamma _n(s_1)\), in which each \(s_{1m}\in S\) is player m’s initial state. The game’s payoffs and state evolutions are still described by the \(\tilde{f}_t\)’s and \(\tilde{g}_t\)’s, respectively. However, details are messier as outside environments vary from player to player and their evolutions are random.

For \(a\in A\), where A is again a separable metric space, we use \(\delta _a\) to denote the singleton Dirac measure with \(\delta _a(\{a\})=1\). For \(a=(a_1,\ldots ,a_n)\in A^n\) where \(n\in \mathbb {N}\), we use \(\varepsilon _a\) for \(\sum _{m=1}^n \delta _{a_m}/n\), the empirical distribution generated by the vector a. We also use \(\mathcal{P}_n(A)\) to denote the space of probability measures of the type \(\varepsilon _a\) for \(a\in A^n\), i.e., the space of empirical distributions generated from n samples. Now back at the game \(\Gamma _n(s_1)\), suppose in period \(t=1,2,\ldots ,\bar{t}\), each player \(m=1,2,\ldots ,n\) is in state \(s_{tm}\) and takes action \(x_{tm}\). Then, the in-action environment experienced by player 1 will be \(\varepsilon _{s_{t,-1}x_{t,-1}}=\varepsilon _{((s_{t2},x_{t2}),\ldots ,(s_{tn},x_{tn}))}\). Thus, this player will receive payoff \(\tilde{f}_t(s_{t1},x_{t1},\varepsilon _{s_{t,-1}x_{t,-1}})\) in the period, and his period-\((t+1)\) state \(s_{t+1,1}\) will be sampled from the distribution \(\tilde{g}_t(s_{t1},x_{t1},\varepsilon _{s_{t,-1}x_{t,-1}}|\cdot )\).

Suppose \(\chi _{[1\bar{t}]}=(\chi _t\mid t=1,\ldots ,\bar{t})\in (\mathcal{K}(S,X))^{\bar{t}}\) again describes the policy adopted by all n players. Unlike in an NG, this time \(\chi _{[1\bar{t}]}\) will help to generate a stochastic as opposed to deterministic environment trajectory. To describe each one-period transition in this complex process, we rely on the kernel \(\chi _t^{\;n}\odot \tilde{g}_t^{\;n}\in \mathcal{K}(S^n,S^n)\) defined by

$$\begin{aligned} (\chi _t^{\;n}\odot \tilde{g}_t^{\;n})(s|S')=\int _{X^n}\chi _t^{\;n}(s|dx)\cdot \tilde{g}_t^{\;n}(s,x|S'),\quad \forall s\in S^n, S'\in \mathcal{B}(S^n), \end{aligned}$$
(11)

where \(\chi _t^{\;n}\) is a member of \(\mathcal{K}(S^n,X^n)\) that satisfies

$$\begin{aligned} \chi _t^{\;n}(s|X'_1\times \cdots \times X'_n)=\Pi _{m=1}^n\chi _t(s_m|X'_m),\quad \forall s\in S^n,X'_1,\ldots ,X'_n\in \mathcal{B}(X),\qquad \end{aligned}$$
(12)

and \(\tilde{g}_t^{\;n}\) is a member of \(\mathcal{K}(S^n\times X^n,S^n)\) that satisfies

$$\begin{aligned} \tilde{g}_t^{\;n}(s,x|S'_1\times \cdots \times S'_n)= & {} \Pi _{l=1}^n\tilde{g}_t(s_l,x_l,\varepsilon _{s_{-l}x_{-l}}|S'_l), \nonumber \\&\forall (s,x)\in S^n\times X^n,S'_1,\ldots ,S'_n\in \mathcal{B}(S). \end{aligned}$$
(13)

In combination, (11) can be spelled out as

$$\begin{aligned} (\chi _t^{\;n}\odot \tilde{g}_t^{\;n})(s|S'_1\times \cdots \times S'_n)=\int _{X^n}\Pi _{m=1}^n \chi _t(s_m|dx_m)\cdot \Pi _{l=1}^n \tilde{g}_t(s_l,x_l,\varepsilon _{s_{-l}x_{-l}}|S'_l).\nonumber \\ \end{aligned}$$
(14)

The above reflects that, each player m samples his action \(x_m\) from the distribution \(\chi _t(s_m|\cdot )\); once all players’ actions \(x=(x_1,\ldots ,x_n)\) have been determined, each player l will face his unique in-action environment \(\varepsilon _{s_{-l}x_{-l}}\); thus, this player’s period-\((t+1)\) state will be sampled from the distribution \(\tilde{g}_t(s_l,x_l,\varepsilon _{s_{-l}x_{-l}}|\cdot )\).

When the n players start period t with a random multi-state with distribution \(\pi _{nt}\in \mathcal{P}(S^n)\) and they act according to random rule \(\chi _t\in \mathcal{K}(S,X)\) in the period, they will generate the joint distribution \(\mu _{nt}\in \mathcal{P}(S^n\times X^n)\) of period-t multi-state and -action satisfying

$$\begin{aligned} \mu _{nt}=\pi _{nt}\otimes \chi _t^{\;n}. \end{aligned}$$
(15)

According to (3) and (12), the above means that, for any \(S'\in \mathcal{B}(S^n)\) and \(X'_1,\ldots ,X'_n\in \mathcal{B}(X)\),

$$\begin{aligned} \mu _{nt}(S'\times X'_1\times \cdots \times X'_n)= & {} \int _{S'}\pi _{nt}(ds)\cdot \chi _t^{\;n}(s|X'_1\times \cdots \times X'_n)\nonumber \\= & {} \int _{S'}\pi _{nt}(ds)\cdot \Pi _{m=1}^n \chi _t(s_m|X'_m). \end{aligned}$$
(16)

Clearly, (15) corresponds to (5) in the NG situation.

By (11), the period-\((t+1)\) multi-state distribution \(\mu _{nt}\odot \tilde{g}_t^{\;n}\in \mathcal{P}(S^n)\) will follow

$$\begin{aligned} (\mu _{nt}\odot \tilde{g}_t^{\;n})(S')=\int _{S^n\times X^n}\mu _{nt}(ds\times dx)\cdot \tilde{g}_t^{\;n}(s,x|S'),\quad \forall S'\in \mathcal{B}(S^n). \end{aligned}$$
(17)

Combining (15) and (17), we can see that the one-period transition between multi-states is

$$\begin{aligned} \pi _{n,t+1}=(\pi _{nt}\otimes \chi _t^{\;n})\odot \tilde{g}_t^{\;n}=\pi _{nt}\odot \chi _t^{\;n}\odot \tilde{g}_t^{\;n}. \end{aligned}$$
(18)

Note (18) is the n-player game’s answer to the NG’s (8). Similar to (9), for \(t\le t'\), the distribution \(\pi _{nt'}\) of period-\(t'\) multi-state \(s_{t'}\) is given by

$$\begin{aligned} \pi _{nt'}=\pi _{nt}\odot \Pi _{t''=t}^{t'-1}(\chi _{t''}^{\;n}\odot \tilde{g}_{t''}^{\;n}). \end{aligned}$$
(19)

When the initial multi-state \(s_1\) is randomly drawn from distribution \(\pi _{n1}\), the entire trajectory \(\pi _{n,[1,\bar{t}+1]}=(\pi _{nt}|t=1,2,\ldots ,\bar{t},\bar{t}+1)\) of the n-player game’s multi-state distributions can be written as

$$\begin{aligned} \pi _{n,[1,\bar{t}+1]}=\left( \pi _{n1}\odot \Pi _{t'=1}^{t-1}(\chi _{t'}^{\;n}\odot \tilde{g}_{t'}^{\;n})|t=1,2,..,\bar{t},\bar{t}+1\right) . \end{aligned}$$
(20)

When all players’ states are sampled from some \(\sigma _1\in \mathcal{P}(S)\), we still have (20) as the trajectory for multi-state distributions, but with \(\pi _{n1}=\sigma _1^{\;n}\). When recognizing \(\pi _{n1}=\delta _{s_1}\), the Dirac measure in \(\mathcal{P}(S^n)\) that assigns the full weight to \(s_1\), (20) will help describe the evolution of the multi-state distribution for the n-player game \(\Gamma _n(s_1)\), much like (10) did for \(\Gamma (\sigma _1)\).

5 Convergence of aggregate environments

Even before touching upon notions like cumulative payoffs and equilibria, we can already introduce an interesting link between finite games and NGs. It is in terms of an asymptotic relationship between a sequence \(\pi _{n,[t,\bar{t}+1]}=(\pi _{nt'}|t'=t,t+1,\ldots ,\bar{t}+1)\) of multi-state distributions in n-player games and a sequence \(\sigma _{[t,\bar{t}+1]}=(\sigma _{t'}|t'=t,t+1,\ldots ,\bar{t}+1)\) of state distributions in their NG counterparts. The message is that, when starting from similar environments in period t and adopting the same action plan from that period on, stochastic environment paths experienced by large finite games will not drift too much away from the NG’s deterministic environment trajectory. We refrain from using the word convergence because the \(\pi _{nt'}\)’s reside in different spaces for different n’s.

First, we propose the concept asymptotic resemblance in order to precisely describe the way in which members in a sequence of probability measures increasingly resemble the products of a given measure. For a separable metric space A, the space \(\mathcal{P}(A)\) is metrized by the Prohorov metric \(\rho _A\), which induces the weak topology on it. At fixed \(n\in \mathbb {N}\), the map \(\varepsilon _{(\cdot )}\) from \(A^n\) to \(\mathcal{P}_n(A)\subseteq \mathcal{P}(A)\) is continuous. Therefore, for any \(p\in \mathcal{P}(A)\) and \(\epsilon >0\), the set \(\{a\in A^n|\rho _A(\varepsilon _a,p)<\epsilon \}\) is an open subset of \(A^n\) and thus a member of \(\mathcal{B}(A^n)\).

Definition 1

For a separable metric space A, suppose \(p\in \mathcal{P}(A)\) and for each \(n\in \mathbb {N}\), \(q_n\in \mathcal{P}(A^n)\). We say that sequence \(q_n\) asymptotically resembles the sequence \(p^n\) made up of p’s n-th order products \(p\times \cdots \times p\), if for any \(\epsilon >0\) and n that is large enough,

$$\begin{aligned} q_n(\{a\in A^n|\rho _A(\varepsilon _a,p)<\epsilon \})>1-\epsilon . \end{aligned}$$

Definition 1 says that sequence \(q_n\) will asymptotically resemble the sequence \(p^n\) of product measures when the empirical distribution \(\varepsilon _a\) of a random vector \(a=(a_1,\ldots ,a_n)\), sampled from \(q_n\), is highly likely to be close to p as n approaches \(+\infty \). This resemblance notion is consistent with Prohorov’s theorem (Parthasarathy 2005, Theorem II.7.1), whose weak version is presented as Lemma 2 in Appendix 1. Due to it, any sequence \((p')^n\) will asymptotically resemble the sequence \(p^n\) if and only if \(p'=p\).

Some results related to the resemblance concept have been placed in Appendix 1. Lemma 3 stems from Dvoretzky, Kiefer and Wolfolwitz’s (1956) inequality and makes the convergence in Lemma 2 uniform in the chosen probability p. According to Lemma 4, the tampering of one component within any n-long vector \(a\in A^n\) would not much alter \(\varepsilon _a\). It is therefore natural for Lemma 5 to state that the resemblance of \(q_n\) to \(p^n\) would lead to that of the \(A^{n-1}\)-marginal \(q_n|_{A^{n-1}}\) to \(p^{n-1}\). Lemma 6 says that the above would also lead to the asymptotic resemblance of \(p'\times q_{n-1}\) to \(p^n\) for any \(p'\). So in general there can be nothing substantial regarding the relationship between the A-marginals \(q_n|_A\) and p. Finally, Lemma 7 shows that asymptotic resemblance is preserved under the projection of \(A\times B\) into A.

The following one-step result states that asymptotic resemblance concerning pre-action environments is translatable into that concerning in-action environments; also, the same resemblance is preserved after undergoing one single step in a game.

Proposition 1

Let state distribution \(\sigma \in \mathcal{P}(S)\), random state-dependent action plan \(\chi \in \mathcal{K}(S,X)\), and state-transition kernel \(g\in \mathcal{G}(S,X)\), with the latter enjoying the continuity of \(g(s,x,\tau )\) in the joint state-action distribution \(\tau \) at an (sx)-independent rate. Also, multi-state distribution \(\pi _n\in \mathcal{P}(S^n)\) for each \(n\in \mathbb {N}\). Suppose further that the sequence \(\pi _n\) asymptotically resembles the sequence \(\sigma ^n\). Then,

  1. (i)

    the sequence \(\pi _n\otimes \chi ^n\) will asymptotically resemble the sequence \((\sigma \otimes \chi )^n\), and

  2. (ii)

    the sequence \(\pi _n\odot \chi ^n\odot g^n\) will asymptotically resemble the sequence \((\sigma \odot \chi \odot g(\cdot ,\cdot ,\sigma \otimes \chi ))^n\).

    Indeed, (ii) remains valid under mild contamination. That is, for any \((s,x)\in S\times X\),

  3. (iii)

    the sequence \((\delta _{sx}\times (\pi _{n-1}\otimes \chi ^{n-1}))\odot g^n\) will asymptotically resemble the sequence \((\sigma \odot \chi \odot g(\cdot ,\cdot ,\sigma \otimes \chi ))^n\) at a rate independent of the chosen (sx).

Proposition 1 is one of our two most technical results. Its proof invokes both Prohorov’s theorem (Parthasarathy 2005, Theorem II.7.1) on the convergence of empirical distributions and for parts (ii) and (iii), Dvoretzky, Kiefer and Wolfolwitz’s (1956) inequality which provides the uniformity of such convergence. In the proposition, part (i) stresses the passibility from convergence of pre-action environments to that of same-period in-action environments, see (5) and (15); part (ii) further points out that convergence in next-period pre-action environments will follow suit, see (8) and (18); also, part (iii) will be useful when we take the view point from one single player.

To take advantage of Proposition 1, we now assume the equi-continuity of the state transitions with respect to in-action environments.

Assumption 1

Each transition kernel \(\tilde{g}_t(s,x,\tau )\) is continuous in \(\tau \) at an (sx)-independent rate. That is, for any in-action environment \(\tau \in \mathcal{P}(S\times X)\) and \(\epsilon >0\), there is \(\delta >0\), such that for any \(\tau '\in \mathcal{P}(S\times X)\) satisfying \(\rho _{S\times X}(\tau ,\tau ')<\delta \) and any \((s,x)\in S\times X\),

$$\begin{aligned} \rho _S(\tilde{g}_t(s,x,\tau ),\tilde{g}_t(s,x,\tau '))<\epsilon . \end{aligned}$$

We are in a position to derive this section’s main result. It states that, when an NG and its finite counterparts evolve under the same action plan, environment pathways of large finite games, though stochastic, will resemble the deterministic pathway of the NG.

Theorem 1

Let a policy profile \(\chi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\) for periods \(t,t+1,\ldots ,\bar{t}\) be given. When \(s_t=(s_{t1},\ldots ,s_{tn})\) has a distribution \(\pi _{nt}\) that asymptotically resembles \(\sigma _t^{\;n}\), the series \((\pi _{nt}\odot \Pi _{t''=t}^{t'-1}(\chi _{t''}^{\;n}\odot \tilde{g}_{t''}^{\;n})\mid t'=t,t+1,\ldots ,\bar{t},\bar{t}+1)\) will asymptotically resemble \(((T_{[t,t'-1]}(\chi _{[t,t'-1]})\circ \sigma _t)^n\mid t'=t,t+1,\ldots ,\bar{t},\bar{t}+1)\) as well. That is, for any \(\epsilon >0\) and any n that is large enough,

$$\begin{aligned}{}[\pi _{nt}\odot \Pi _{t''=t}^{t'-1}(\chi _{t''}^{\;n}\odot \tilde{g}_{t''}^{\;n})](\tilde{A}_{nt'}(\epsilon ))>1-\epsilon ,\quad \forall t'=t,t+1,\ldots ,\bar{t}+1, \end{aligned}$$

where for each \(t'\), the set of multi-states \(\tilde{A}_{nt'}(\epsilon )\in \mathcal{B}(S^n)\) is such that,

$$\begin{aligned} \rho _S(\varepsilon _{s_{t'}},T_{[t,t'-1]}(\chi _{[t,t'-1]})\circ \sigma _t)<\epsilon , \quad \forall s_{t'}\in \tilde{A}_{nt'}(\epsilon ). \end{aligned}$$

Suppose an NG starts period t with pre-action environment \(\sigma _t\) and a slew of finite games start the period with pre-action environments that are ever nearly sampled from \(\sigma _t\). Let the evolution of both types of games be guided by players acting according to the same policy profile \(\chi _{[t\bar{t}]}\). Then, as the numbers of players n involved in finite games grow indefinitely, Theorem 1 predicts for ever less chances for the finite games’ period-\(t'\) environments \(\varepsilon _{s_{t'}}\) to be even slightly away from the NG’s deterministic period-\(t'\) environment \(T_{[t,t'-1]}(\chi _{[t,t'-1]})\circ \sigma _t\). For some fixed \(\sigma _1\in \mathcal{P}(S)\), we can plug \(t=1\) and \(\pi _{n1}=\sigma _1^{\;n}\) into Theorem 1. Then, we will obtain the proximity between \(\sigma ^{\;n}_{[1,\bar{t}+1]}=(\sigma ^{\;n}_t|t=1,2,\ldots ,\bar{t},\bar{t}+1)\) and \(\pi _{n,[1,\bar{t}+1]}=(\pi _{nt}|t=1,2,\ldots ,\bar{t},\bar{t}+1)\) for large n’s, where every \(\sigma _t=T_{[1,t-1]}(\chi _{[1,t-1]})\circ \sigma _1\) and every \(\pi _{nt}=\sigma _1^{\;n}\odot \Pi _{t'=1}^{t-1}(\chi _{t'}^{\;n}\odot \tilde{g}_{t'}^{\;n})\). In view of (10) and (20), this means that when large games sample their initial states from an NG’s starting distribution \(\sigma _1\), the former games’ state-distribution trajectories will remain close to that of the latter game.

Our confinement so far to discrete spaces S and X arises mainly from the need to deal with non-product joint probabilities of the form \(p\otimes \kappa \); see (3). In Yang (2015), where random state transitions and random action plans were modeled through independently generated shocks, only results pertaining to product-form probabilities \(p\times q\), where q is an ordinary rather than conditional probability, were needed. Because of this, known properties like Propositions III.4.4 and III.4.6 of Ethier and Kurtz (1986) could be put to good use. Results there could thus be based on complete state and shock spaces. In contrast, if we were to consider more general spaces here, we would face the presently unsurmountable challenge of passing the closeness between measures p and \(p_i\) for \(i=1,2,\ldots ,n\) onto that between \(p^n\) and \(\prod _{i=1}^n p_i\) when n itself tends to infinity.

6 NG and finite-game equilibria

We present this paper’s main result that an NG equilibrium, though oblivious of past history and blind to other players’ states, will generate minimal regrets when adopted by players in large finite games. First, we introduce equilibrium concepts used in both types of games.

6.1 Equilibria in NG

In defining the NG \(\Gamma (\sigma _1)\)’s equilibria, we subject a candidate policy profile to one-time deviation of a single player, who is by default infinitesimal in influence. Note the deviation will not alter the environment trajectory corresponding to the candidate profile. With this understanding, we define \(v_t(s_t,\xi _{[t\bar{t}]},\sigma _t,\chi _{[t\bar{t}]})\) as the total expected payoff a player can receive from period t to \(\bar{t}\), when he starts with state \(s_t\in S\) and adopts action plan \(\xi _{[t\bar{t}]}=(\xi _t,\ldots ,\xi _{\bar{t}})\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\) throughout, while other players form initial pre-action environment \(\sigma _t\in \mathcal{P}(S)\) and adopt policy profile \(\chi _{[t\bar{t}]}=(\chi _t,\ldots ,\chi _{\bar{t}})\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\) throughout. As a terminal condition, we certainly have

$$\begin{aligned} v_{\bar{t}+1}(s_{\bar{t}+1},\sigma _{\bar{t}+1})=0. \end{aligned}$$
(21)

For \(t=\bar{t},\bar{t}-1,\ldots ,1\), we have the recursive relationship

$$\begin{aligned}&v_t(s_t,\xi _{[t\bar{t}]},\sigma _t,\chi _{[t\bar{t}]}) =\int _X \xi _t(s_t|dx_t)\cdot \bigg [\tilde{f}_t(s_t,x_t,\sigma _t\otimes \chi _t)\nonumber \\&\quad +\int _S \tilde{g}_t(s_t,x_t,\sigma _t\otimes \chi _t|ds_{t+1})\cdot v_{t+1}(s_{t+1},\xi _{[t+1,\bar{t}]},T_t(\chi _t)\circ \sigma _t,\chi _{[t+1,\bar{t}]})\bigg ].\qquad \end{aligned}$$
(22)

This is because the player’s action is guided in a random fashion by \(\xi _t\), its payoff is determined by \(\tilde{f}_t\), its state evolution is governed by \(\tilde{g}_t\), and its future payoff is supplied by \(v_{t+1}\); also, after undergoing the commonly adopted action plan \(\chi _t\), the period-\((t+1)\) pre-action environment \(\sigma _{t+1}\) will be \(T_t(\chi _t)\circ \sigma _t\) as shown in (8). The choice of \(\xi _t\) affects the current player’s period-t action \(x_t\), his period-\((t+1)\) state \(s_{t+1}\), and his future state-action trajectory. However, the change at this negligible player does not alter the period-t in-action environment \(\sigma _t\otimes \chi _t\) as listed in (5) or any environment in the future. This is the main reason why NGs are easier to handle than their finite-player counterparts.

Now, we deem policy \(\chi _{[1\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}}\) a Markov equilibrium for the game \(\Gamma (\sigma _1)\) when, for every \(t=1,2,\ldots ,\bar{t}\) and \(\xi _t\in \mathcal{K}(S,X)\),

$$\begin{aligned} v_t(s_t,\chi _{[t\bar{t}]},\sigma _t,\chi _{[t\bar{t}]})\ge v_t(s_t,(\xi _t,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]}),\quad \forall s_t\in S, \end{aligned}$$
(23)

where

$$\begin{aligned} \sigma _t=T_{[1,t-1]}(\chi _{[1,t-1]})\circ \sigma _1. \end{aligned}$$
(24)

That is, policy \(\chi _{[1\bar{t}]}\) will be regarded an equilibrium when no player can be better off by unilaterally deviating to any alternative plan \(\xi _t\in \mathcal{K}(S,X)\) in any single period t. The definition of \(\sigma _t\) in (24) underscores the evolution of the deterministic environment trajectory following the adoption of action plan \(\chi _{[1\bar{t}]}\) by almost all players.

6.2 \(\epsilon \)-equilibria in n-player games

For an n-player game, let \(v_{nt}(s_{t1},\xi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})\) be the total expected payoff player 1 can receive from period t to \(\bar{t}\), when he starts with state \(s_{t1}\in S\) and adopts action plan \(\xi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\) throughout, while other players form initial empirical state distribution \(\varepsilon _{s_{t,-1}}=\varepsilon _{(s_{t2},\ldots ,s_{tn})}\in \mathcal{P}_{n-1}(S)\) and adopt action plan \(\chi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\) throughout. As a terminal condition, we have

$$\begin{aligned} v_{n,\bar{t}+1}(s_{\bar{t}+1,1},\varepsilon _{s_{\bar{t}+1,-1}})=0. \end{aligned}$$
(25)

For \(t=\bar{t},\bar{t}-1,\ldots ,1\), we have the recursive relationship

$$\begin{aligned} v_{nt}(s_{t1},\xi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})= & {} \int _X\xi _t(s_{t1}|dx_{t1})\cdot \int _{X^{n-1}}\chi _t^{\;n-1}(s_{t,-1}|dx_{t,-1})\nonumber \\&\times \,\bigg [\tilde{f}_t(s_{t1},x_{t1},\varepsilon _{s_{t,-1}x_{t,-1}}) +\int _{S^n}\tilde{g}_t^{\;n}(s_t,x_t|ds_{t+1})\nonumber \\&\cdot \, v_{n,t+1}(s_{t+1,1},\xi _{[t+1,\bar{t}]},\varepsilon _{s_{t+1},-1},\chi _{[t+1,\bar{t}]})\bigg ], \end{aligned}$$
(26)

where the meaning of \(\chi _t^{\;n-1}(s_{t,-1}|dx_{t,-1})\) follows from (12) and that of \(\tilde{g}_t^{\;n}(s_t,x_t|ds_{t+1})\) follows from (13). Note (26) differs substantially from its NG counterpart (22). With only a finite number of players, player 1’s one-time choice \(\xi _t\) not only affects his own future actions and states as before, but differently, starting from the altered in-action environment \(\varepsilon _{s_tx_t}\), it also impacts the entire future trajectory of all other players. Note \(\varepsilon _{s_tx_t}\) impacts the generation of \(s_{t+1}=(s_{t+1,1},\ldots ,s_{t+1,n})\) in its projections to n different \((n-1)\)-dimensional spaces, as according to (13), \(\int _{S^n}\tilde{g}_t^{\;n}(s_t,x_t|ds_{t+1})\) amounts to \(\Pi _{m=1}^n\int _S \tilde{g}_t(s_{tm},x_{tm},\varepsilon _{s_{t,-m}x_{t,-m}}|ds_{t+1,m})\).

For each \(n\in \mathbb {N}\setminus \{1\}\), let \(\hat{\pi }_{n-1,[1\bar{t}]}=(\hat{\pi }_{n-1,t}\mid t=1,\ldots ,\bar{t})\in (\mathcal{P}(S^{n-1}))^{\bar{t}}\) be a series of other-player multi-state distributions. For \(\epsilon \ge 0\), we deem \(\chi _{[1\bar{t}]}=(\chi _t\mid t=1,\ldots ,\bar{t})\in (\mathcal{K}(S,X))^{\bar{t}}\) an \(\epsilon \)-Markov equilibrium for the game family \((\Gamma _n(s_1)\mid s_1\in S^n)\) in the sense of \(\hat{\pi }_{n-1,[1\bar{t}]}\) when, for every \(t=1,\ldots ,\bar{t}\), \(\xi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\), and \(s_{t1}\in S\),

$$\begin{aligned}&\int _{S^{n-1}}\hat{\pi }_{n-1,t}(ds_{t,-1})\cdot v_{nt}(s_{t1},\chi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})\nonumber \\&\quad \ge \int _{S^{n-1}}\hat{\pi }_{n-1,t}(ds_{t,-1})\cdot v_{nt}(s_{t1},\xi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})-\epsilon . \end{aligned}$$
(27)

That is, action plan \(\chi _{[1\bar{t}]}\) will be an \(\epsilon \)-Markov equilibrium in the sense of \(\hat{\pi }_{n-1,[1\bar{t}]}\) when under the plan’s guidance, the average payoff from any period t and player-1 state \(s_{t1}\) on cannot be improved by more than \(\epsilon \) through any unilateral deviation, where the “average” is based on other players’ multi-state \(s_{t,-1}\) being sampled from the distribution \(\hat{\pi }_{n-1,t}\). Note (27) differs from (23) also in that its unilateral deviation need not be one-time.

6.3 Main transient result

Before moving on, we need the single-period payoff functions \(\tilde{f}_t\) to be continuous.

Assumption 2

Each payoff function \(\tilde{f}_t(s,x,\tau )\) is continuous in the in-action environment \(\tau \) at an (sx)-independent rate. That is, for any \(\tau \in \mathcal{P}(S\times X)\) and \(\epsilon >0\), there is \(\delta >0\), such that for any \(\tau '\in \mathcal{P}(S\times X)\) satisfying \(\rho _{S\times X}(\tau ,\tau ')<\delta \) and any \((s,x)\in S\times X\),

$$\begin{aligned} \mid \tilde{f}_t(s,x,\tau )-\tilde{f}_t(s,x,\tau ')\mid <\epsilon . \end{aligned}$$

Now we show the convergence of finite-game value functions to their NG counterpart, the proof of which is quite technical as well, and calls upon parts (i) and (iii) of Proposition 1.

Proposition 2

For any \(t=1,2,\ldots ,\bar{t}+1\), let \(\sigma _t\in \mathcal{P}(S)\) and \(\hat{\pi }_{n-1,t}\in \mathcal{P}(S^{n-1})\) for each \(n\in \mathbb {N}\). Suppose the sequence \(\hat{\pi }_{n-1,t}\) asymptotically resembles the sequence \(\sigma _t^{\;n-1}\). Then for any \(\chi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\), the sequence \(\int _{S^{n-1}}\hat{\pi }_{n-1,t}(ds_{t,-1})\cdot v_{nt}(s_{t1},\xi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})\) will converge to \(v_t(s_{t1},\xi _{[t\bar{t}]},\sigma _t,\chi _{[t\bar{t}]})\) at a rate that is independent of both \(s_{t1}\in S\) and \(\xi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\).

Combining (23) and (27), as well as Proposition 2, we can come to the main result.

Theorem 2

For some \(\sigma _1\in \mathcal{P}(S)\), suppose \(\chi _{[1\bar{t}]}=(\chi _t\mid t=1,2,\ldots ,\bar{t})\in (\mathcal{K}(S,X))^{\bar{t}}\) is a Markov equilibrium of NG \(\Gamma (\sigma _1)\). Also, suppose \(\hat{\pi }_{n-1,[1\bar{t}]}=(\hat{\pi }_{n-1,t}|t=1,2,\ldots ,\bar{t})\in (\mathcal{P}(S^{n-1}))^{\bar{t}}\) is such that the sequence \(\hat{\pi }_{n-1,t}\) asymptotically resembles the sequence \(\sigma _t^{\;n-1}\) for each t, where \(\sigma _t=T_{[1,t-1]}(\chi _{[1,t-1]})\circ \sigma _1\). Then, for \(\epsilon >0\) and large enough \(n\in \mathbb {N}\), the given \(\chi _{[1\bar{t}]}\) is also an \(\epsilon \)-Markov equilibrium for the game family \((\Gamma _n(s_1)\mid s_1\in S^n)\) in the sense of \(\hat{\pi }_{n-1,[1\bar{t}]}\).

The theorem says that players in a large finite game can agree on an NG equilibrium and expect to lose little on average, as long as the other-player multi-state distribution \(\hat{\pi }_{n-1,t}\) on which “average” is based is similar to the product form \(\sigma _t^{\;n-1}\), where \(\sigma _t=T_{[1,t-1]}(\chi _{[1,t-1]})\circ \sigma _1\) is the corresponding NG’s predictable equilibrium state distribution for the same period. As to whether reasonable \(\hat{\pi }_{n-1,[1\bar{t}]}=(\hat{\pi }_{n-1,t}|t=1,2,\ldots ,\bar{t})\) exists to satisfy this condition, the answer is affirmative. The next section is dedicated to this point.

7 The condition in Theorem 2

We now present examples where the key condition in Theorem 2 can be true. In all of them, we let the initial other-player multi-state distribution \(\hat{\pi }_{n-1,1}=\sigma _1^{\;n-1}=\sigma _1^{\;n}|_{S^{n-1}}\). That is, we let players’ initial states in n-player games be randomly drawn from the NG’s initial state distribution \(\sigma _1\). Now we discuss what can happen in periods \(t=2,3,\ldots ,\bar{t}\).

7.1 Two possibilities

First, we can let each \(\hat{\pi }_{n-1,t}=\sigma _t^{\;n-1}\). It has been discussed right after Definition 1 that the sequence \(\sigma _t^{\;n-1}\) asymptotically resembles itself. So this choice satisfies the condition in Theorem 2. This would correspond to the case where players in large finite games take the “lazy” approach of using independent draws on the NG state distribution to assess their opponents’ states. Note this is reasonable due to the common initial condition for both types of games and Theorem 1.

Second, we can let each \(\hat{\pi }_{n-1,t}=\pi _{nt}|_{S^{n-1}}\), where

$$\begin{aligned} \pi _{nt}=\sigma _1^{\;n}\odot \Pi _{t'=1}^{t-1}(\chi _{t'}^{\;n}\odot \tilde{g}_{t'}^{\;n}). \end{aligned}$$
(28)

According to (19), \(\pi _{nt}\) stands for players’ multi-state distribution in period t in an n-player game when their initial states are randomly drawn from the distribution \(\sigma _1\) and then from period 1 onward players all follow through with the NG equilibrium \(\chi _{[1\bar{t}]}\). Since the sequence \(\sigma _1^{\;n}\) asymptotically resembles itself, Theorem 1 will ascertain the asymptotic resemblance of \(\pi _{nt}\) to \(\sigma _t^{\;n}\). Then, Lemma 5 in Appendix 1 will lead to the asymptotic resemblance of \(\hat{\pi }_{n-1,t}\) to \(\sigma _t^{\;n-1}\). So this choice would satisfy Theorem 2’s condition as well. Also, its meaning is clear—here players in large finite games use precise assessments on what other players’ states might be had they followed the NG equilibrium all along.

7.2 Refinement and a third choice

Note that \(\hat{\pi }_{n-1,t}\) has not countenanced the possibility in which a player involves his own state \(s_{t1}\) in the estimation of the other-player multi-state \(s_{t,-1}\). We now show that this is possible at least when the state space S is finite. In that case, we can upgrade the \(\hat{\pi }_{n-1,t}\in \mathcal{P}(S^{n-1})\) in Proposition 2 to \(\hat{\pi }_{n-1,t}(\cdot )=(\hat{\pi }_{n-1,t}(s_{t1}|\cdot )|s_{t1}\in S)\in (\mathcal{P}(S^{n-1}))^S\) and obtain the convergence of \(\int _{S^{n-1}}\hat{\pi }_{n-1,t}(s_{t1}|ds_{t,-1})\cdot v_{nt}(s_{t1},\xi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})\) to \(v_t(s_{t1},\xi _{[t\bar{t}]},\sigma _t,\chi _{[t\bar{t}]})\) at an \(s_{t1}\)-independent rate. This will lead us to the following extended version of Theorem 2.

Theorem 3

Suppose \(\sigma _{[1,\bar{t}+1]}\) and \(\chi _{[1\bar{t}]}\) are all the same as in Theorem 2. Also, suppose \(\hat{\pi }_{n-1,[1\bar{t}]}(\cdot )=(\hat{\pi }_{n-1,t}(s_{t1}|\cdot )|t=1,2,\ldots ,\bar{t},\;s_{t1}\in S)\in ((\mathcal{P}(S^{n-1}))^S)^{\bar{t}}\) is such that the sequence \(\hat{\pi }_{n-1,t}(s_{t1}|\cdot )\) asymptotically resembles the sequence \(\sigma _t^{\;n-1}\) for each t and \(s_{t1}\). Then, for \(\epsilon >0\) and large enough \(n\in \mathbb {N}\), for every \(t=1,\ldots ,\bar{t}\), \(\xi _{[t\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}-t+1}\), and \(s_{t1}\in S\),

$$\begin{aligned}&\int _{S^{n-1}}\hat{\pi }_{n-1,t}(s_{t1}|ds_{t,-1})\cdot v_{nt}(s_{t1},\chi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]}) \\&\quad \ge \int _{S^{n-1}}\hat{\pi }_{n-1,t}(s_{t1}|ds_{t,-1})\cdot v_{nt}(s_{t1},\xi _{[t\bar{t}]},\varepsilon _{s_{t,-1}},\chi _{[t\bar{t}]})-\epsilon . \end{aligned}$$

For it to satisfy the condition in Theorem 3, we can still let \(\hat{\pi }_{n-1,[1\bar{t}]}(\cdot )\) be the same as in the aforementioned two examples, in which the newly added \(s_{t1}\)-dependence is mute. But a third choice would allow each player a full-fledged Bayesian update on other players’ states.

In this third choice, we still use (28) to define \(\pi _{nt}\). Then, as long as \(\sigma _t(s_{t1})>0\), we let

$$\begin{aligned} \hat{\pi }_{n-1,t}(s_{t1}|\cdot )=\pi _{nt,S}|_{S^{n-1}}(s_{t1}|\cdot ), \end{aligned}$$
(29)

the other-player multi-state distribution derivable from \(\pi _{nt}\) when conditioned on the current player’s state \(s_{t1}\); otherwise, we simply let \(\hat{\pi }_{n-1,t}=\pi _{nt}|_{S^{n-1}}\) just as in the second example. Note the marginal \(\pi _{nt}|_S\) is defined by

$$\begin{aligned} \pi _{nt}|_S(\{s_{t1}\})=\pi _{nt}(\{s_{t1}\}\times S^{n-1}),\quad \forall s_{t1}\in S, \end{aligned}$$
(30)

and each conditional distribution \(\pi _{nt,S}|_{S^{n-1}}(s_{t1}|\cdot )\) is defined by

$$\begin{aligned} \pi _{nt,S}|_{S^{n-1}}(s_{t1}|S')=\frac{\pi _{nt}(\{s_{t1}\}\times S')}{\pi _{nt}|_S(\{s_{t1}\})}=\frac{\pi _{nt}(\{s_{t1}\}\times S')}{\pi _{nt}(\{s_{t1}\}\times S^{n-1})},\quad \forall S'\in \mathcal{B}(S^{n-1}),\qquad \end{aligned}$$
(31)

when the denominator is strictly positive and an arbitrary value otherwise.

7.3 Symmetry makes it work

The lone fact that \(\pi _{nt}\) asymptotically resembles \(\sigma _t^{\;n}\) is actually quite far from being able to dictate the asymptotic resemblance of the thus defined \(\hat{\pi }_{n-1,t}(s_{t1}|\cdot )\) to \(\sigma _t^{\;n-1}\). Note that for a general \(q_n\) resembling some \(p^n\), Lemma 6 in Appendix 1 has all but ruled out the convergence of \(\pi _n|_A\) to p, let alone the asymptotic resemblance of \(q_{n,A}|_{A^{n-1}}\) to \(p^{n-1}\). Fortunately, \(\pi _{nt}\) still enjoys the additional feature of being symmetric.

For any \(n\in \mathbb {N}\), let \(\Psi _n\) be the set of all n-dimensional permutations. That is, each \(\psi \in \Psi _n\) makes \((\psi (1),\ldots ,\psi (n))\) a permutation of \((1,\ldots ,n)\). For a given \(\psi \in \Psi _n\), let us suppose \(\psi a=(a_{\psi (1)},\ldots ,a_{\psi (n)})\) for any \(a=(a_1,\ldots ,a_n)\in A^n\), and then \(\psi A'=\{\psi a|a\in A'\}\) for any \(A'\subseteq A^n\). Note that, due to its innately symmetric definition, \(\mathcal{B}(A^n)\) is automatically symmetric in the sense that \(\mathcal{B}(A^n)=\{\psi A'|A'\in \mathcal{B}(A^n)\}\) for any \(\psi \in \Psi _n\).

Definition 2

For \(n\in \mathbb {N}\) and separable metric space A, we say \(q_n\in \mathcal{P}(A^n)\) symmetric if

$$\begin{aligned} q_n(A')=q_n(\psi A'),\quad \forall \psi \in \Psi _n,\;A'\in \mathcal{B}(A^n).\end{aligned}$$

We have the much needed result that asymptotic resemblance of \(q_n\) to \(p^n\) does lead to the convergence of \(q_n|_A\) to p when \(q_n\) is symmetric. This is in stark contrast with Lemma 6.

Proposition 3

Let A be a discrete metric space and \(q_n\in \mathcal{P}(A^n)\) for every \(n\in \mathbb {N}\) be symmetric. Suppose the sequence \(q_n\) asymptotically resembles the sequence \(p^n\). Then, the sequence \(q_n|_A\) will converge to p, namely, \(\lim _{n\rightarrow +\infty }q_n|_A(\{a\})=p(\{a\})\) for every \(a\in A\).

This then results in the resemblance of \(q_{n,A}|_{A^{n-1}}\) to \(p^{n-1}\).

Proposition 4

Let A be a discrete metric space and \(q_n\in \mathcal{P}(A^n)\) for every \(n\in \mathbb {N}\) be symmetric. Suppose the sequence \(q_n\) asymptotically resembles the sequence \(p^n\). Then, the sequence \(q_{n,A}|_{A^{n-1}}(a|\cdot )\) will asymptotically resemble the sequence \(p^{n-1}\) for any \(a\in A\) with \(p(\{a\})>0\).

Note that \(\pi _{n1}\), being equal to \(\sigma _1^{\;n}\), is symmetric. As suggested by (28), the operation it has to go through to arrive to \(\pi _{nt}\) is also symmetric. Hence, \(\pi _{nt}\) is symmetric. Therefore, by Proposition 3, the marginal probability \(\pi _{nt}|_S\) as defined in (30) would converge to the NG state distribution \(\sigma _t\); thus, the conditional distribution \(\pi _{nt,S}|_{S^{n-1}}(s_{t1}|\cdot )\) as defined in (31) would be well defined when \(\sigma _t(s_{t1})>0\). Then, Proposition 4 can guarantee that \(\hat{\pi }_{n-1,t}(s_{t1}|\cdot )\) as defined in (29) would asymptotically resemble \(\sigma _t^{\;n-1}\) and hence help to facilitate the condition needed for Theorem 3. The above suggests that, even when players exercise the most accurate Bayesian updates on other players’ states using their own state information, they will not discern much regret on average by adhering to the NG equilibrium.

8 A stationary situation

Now we study an infinite-horizon model with stationary features. To this end, we keep S and X, but let there be a discount factor \(\bar{\alpha }\in [0,1)\). There is a payoff function \(\tilde{f}\) which meets the basic measurability and boundedness requirements, so that \(\tilde{f}_t=\bar{\alpha }^{t-1}\cdot \tilde{f}\) for \( t=1,2,\ldots \). Let us use \(\bar{f}\) for the bound \(\bar{f}_1\) that appeared in (1). In addition, there is a state transition kernel \(\tilde{g}\in \mathcal{G}(S,X)\), so that \(\tilde{g}_t=\tilde{g}\) for \(t=1,2,\ldots \). For \(\chi \in \mathcal{K}(S,X)\), denote by \(T(\chi )\) the operator on \(\mathcal{P}(S)\), so that for any \(\sigma \in \mathcal{P}(S)\),

$$\begin{aligned} T(\chi )\circ \sigma =\sigma \odot \chi \odot \tilde{g}(\cdot ,\cdot ,\sigma \otimes \chi ). \end{aligned}$$
(32)

Thus, state transition has been made stationary by the stationarity of \(\tilde{g}\).

Denote the stationary nonatomic game formed from the above S, X, \(\bar{\alpha }\), \(\tilde{f}\), and \(\tilde{g}\) by \(\Gamma ^\infty \). It helps to first study the corresponding games \(\Gamma ^t\) that terminate in periods \(t+1\), for \(t=0,1,\ldots \). Now let \(v^t(s,\xi _{[1t]},\sigma ,\chi _{[1t]})\) be the total expected payoff a player can receive in game \(\Gamma ^t\), when he starts at state \(s\in S\) in period 1 and adopts action plan \(\xi _{[1t]}\in (\mathcal{K}(S,X))^t\) from period 1 to t, while all other players form state distribution \(\sigma \in \mathcal{P}(S)\) in the beginning and act according to \(\chi _{[1t]}\in (\mathcal{K}(S,X))^t\) from period 1 to t. As a terminal condition, we have \(v^0(s,\sigma )=0\). Also, for \(t=1,2,\ldots \),

$$\begin{aligned} v^t(s,\xi _{[1t]},\sigma ,\chi _{[1t]})= & {} \int _X \xi _1(s|dx)\cdot \left[ \phantom {\int _{S^n}}\tilde{f}(s,x,\sigma \otimes \chi _1)\nonumber \right. \\&\left. +\,\bar{\alpha }\cdot \int _S \tilde{g}(s,x,\sigma \otimes \chi _1|ds')\cdot v^{t-1}(s',\xi _{[2t]},T(\chi _1)\circ \sigma ,\chi _{[2t]})\right] .\nonumber \\ \end{aligned}$$
(33)

Using the terminal condition and (33), we can inductively show that

$$\begin{aligned} \mid v^{t+1}(s,\xi _{[1,t+1]},\sigma ,\chi _{[1,t+1]})-v^t(s,\xi _{[1t]},\sigma ,\chi _{[1t]})\mid \le \bar{\alpha }^t\cdot \bar{f}. \end{aligned}$$
(34)

Given \(s\in S\), \(\xi _{[1\infty ]}=(\xi _1,\xi _2,\ldots )\in (\mathcal{K}(S,X))^\infty \), \(\sigma \in \mathcal{P}(S)\), and \(\chi _{[1\infty ]}=(\chi _1,\chi _2,\ldots )\in (\mathcal{K}(S,X))^\infty \), the sequence \(\{v^t(s,\xi _{[1t]},\sigma ,\chi _{[1t]})\mid t=0,1,\ldots \}\) is thus Cauchy and has a limit point \(v^\infty (s,\xi _{[1\infty ]},\sigma ,\chi _{[1\infty ]})\). The latter is the total discounted expected payoff a player can obtain in the game \(\Gamma ^\infty \), when he starts at state s and adopts action plan \(\xi _{[1\infty ]}\), while all other players form initial pre-action environment \(\sigma \) and act according to \(\chi _{[1\infty ]}\).

A pre-action environment \(\sigma \in \mathcal{P}(S)\) is said to be associated with \(\chi \in \mathcal{K}(S,X)\) when

$$\begin{aligned} \sigma =T(\chi )\circ \sigma . \end{aligned}$$
(35)

That is, we let environment \(\sigma \) be associated with action plan \(\chi \) when the former is invariant under the one-period transition when all players adhere to the latter. For \(\chi \in \mathcal{K}(S,X)\), we use \(\chi ^\infty \) to represent the stationary policy profile \((\chi ,\chi ,\ldots )\in (\mathcal{K}(S,X))^\infty \) that players are to adopt in all periods \(t=1,2,\ldots \).

We deem one-time action plan \(\chi \in \mathcal{K}(S,X)\) a stationary Markov equilibrium for the nonatomic game \(\Gamma ^\infty \), when there exists a \(\sigma \in \mathcal{P}(S)\) that is associated with the given \(\chi \), so that for every one-time unilateral deviation \(\xi \in \mathcal{K}(S,X)\),

$$\begin{aligned} v^\infty (s,\chi ^\infty ,\sigma ,\chi ^\infty ) \ge v^\infty (s,(\xi ,\chi ^\infty ),\sigma ,\chi ^\infty ),\quad \forall s\in S. \end{aligned}$$
(36)

Therefore, a policy will be considered an equilibrium when it induces an invariant environment under whose sway the policy turns out to be a best response in the long run.

Now we move on to the n-player game \(\Gamma ^\infty _n\) made out of the same S, X, \(\bar{\alpha }\), \(\tilde{f}\), and \(\tilde{g}\). Similarly to the above, we let \(\Gamma ^t_n\) be its n-player counterpart that terminates in period \(t+1\). Now let \(v^t_n(s_1,\xi _{[1t]},\varepsilon _{s_{-1}},\chi _{[1t]})\) be the total expected payoff player 1 can receive in game \(\Gamma ^n_t\), when he starts with state \(s_1\in S\) and adopts action plan \(\xi _{[1t]}\in (\mathcal{K}(S,X))^t\) from period 1 to t, while other players form initial empirical distribution \(\varepsilon _{s_{-1}}=\varepsilon _{(s_2,\ldots ,s_n)}\in \mathcal{P}_{n-1}(S)\) and adopt policy \(\chi _{[1t]}\in (\mathcal{K}(S,X))^t\) from 1 to t. As a terminal condition, we have \(v^0_n(s_1,\varepsilon _{s_{-1}})=0\). For \(t=1,2,\ldots \), it follows that

$$\begin{aligned} v^t_n(s_1,\xi _{[1t]},\varepsilon _{s_{-1}},\chi _{[1t]})= & {} \int _X\xi _1(s_1|dx_1){\cdot } \int _{X^{n-1}}\chi _1^{\; n-1}(s_{-1}|dx_{-1})\cdot \left[ \phantom {\int _{S^n}}\tilde{f}(s_1,x_1,\varepsilon _{s_{-1}x_{-1}})\right. \nonumber \\&\left. +\,\bar{\alpha }\cdot \int _{S^n}\tilde{g}^n(s,x|ds')\cdot v^{t-1}_n(s'_1,\xi _{[2t]},\varepsilon _{s'_{-1}},\chi _{[2t]})\right] . \end{aligned}$$
(37)

Using the terminal condition and (37), we can inductively show that

$$\begin{aligned} \mid v^{t+1}_n(s_1,\xi _{[1,t+1]},\varepsilon _{s_{-1}},\chi _{[1,t+1]})-v^t_n(s_1,\xi _{[1t]},\varepsilon _{s_{-1}},\chi _{[1t]})\mid \le \bar{\alpha }^t\cdot \bar{f}. \end{aligned}$$
(38)

Given \(s_1\in S\), \(\xi _{[1\infty ]}\in (\mathcal{K}(S,X))^\infty \), \(\varepsilon _{s_{-1}}\in \mathcal{P}_{n-1}(S)\), and \(\chi _{[1\infty ]}\in (\mathcal{K}(S,X))^\infty \), the sequence \(\{v^n_t(s_1,\xi _{[1t]},\varepsilon _{s_{-1}},\chi _{[1t]})\mid t=0,1,\ldots \}\) is Cauchy and has a limit point \(v^\infty _n(s_1,\xi _{[1\infty ]},\varepsilon _{s_{-1}},\chi _{[1\infty ]})\). The latter is the total discounted expected payoff a player can obtain in \(\Gamma ^\infty _n\), when he starts at state s and adopts action plan \(\xi _{[1\infty ]}\), while all other players form the initial pre-action environment \(\varepsilon _{s_{-1}}\) and act according to \(\chi _{[1\infty ]}\).

For the current setting, it should be noted that Assumptions 1 and 2 translate into the continuity in \(\tau \) at an (sx)-independent rate of, respectively, the transition kernel \(\tilde{g}(s,x,\tau )\) and payoff function \(\tilde{f}(s,x,\tau )\). We now present the main result for the stationary case.

Theorem 4

Suppose \(\chi \in \mathcal{K}(S,X)\) is a stationary Markov equilibrium for the stationary nonatomic game \(\Gamma ^\infty \). Let \(\hat{\pi }_{n-1}\in \mathcal{P}(S^{n-1})\) for each \(n\in \mathbb {N}\setminus \{1\}\). Also suppose the sequence \(\hat{\pi }_{n-1}\) asymptotically resembles the sequence \(\sigma ^{n-1}\), where \(\sigma \) is associated with \(\chi \) in the equilibrium definitions (35) and (36). Then, \(\chi ^\infty \) would be asymptotically equilibrium for games \(\Gamma ^\infty _n\) in an average sense. More specifically, for any \(\epsilon >0\) and large enough \(n\in \mathbb {N}\),

$$\begin{aligned}&\int _{S^{n-1}}\hat{\pi }_{n-1}(ds_{-1})\cdot v^\infty _n(s_1,\chi ^\infty ,\varepsilon _{s_{-1}},\chi ^\infty )\\&\quad \ge \int _{S^{n-1}}\hat{\pi }_{n-1}(ds_{-1})\cdot v^\infty _n(s_1,\xi _{[1\infty ]},\varepsilon _{s_{-1}},\chi ^\infty )-\epsilon , \end{aligned}$$

for any \(s_1\in S\) and \(\xi _{[1\infty ]}\in (\mathcal{K}(S,X))^\infty \).

Theorem 4 says that, players in a large finite stationary game will not regret much by adopting a stationary equilibrium for a correspondent stationary nonatomic game. The regret can be measured in an average sense, so long as the underlying other-player multi-state distribution \(\hat{\pi }_{n-1}\) is close to an invariant \(\sigma \) associated with the NG equilibrium. Just as in Sect. 7, we can let \(\hat{\pi }_{n-1}=\sigma ^{n-1}\), indicating that players take a “lazy” approach in assessing other players’ states. We leave discussion of other possibilities to Appendix 5.

9 Implications of main results

9.1 Observation, remembrance, and coordination

Regarding Theorems 2 and 3, we note the following for \(\bar{t}\)-period games. A prominent feature of an NG equilibrium \(\chi _{[1\bar{t}]}\in (\mathcal{K}(S,X))^{\bar{t}}\) is its insensitivity, at any period t, to a player’s personal history \((s_{t'},x_{t'}|t'=1,2,\ldots ,t-1)\), historical data regarding other players, and the present information about other players’ states. Independence of the first two factors has much to do with the Markovian setup of the game—neither \(\tilde{f}_t\) nor \(\tilde{g}_t\) depends on past history. But the more interesting independence of the latter two factors stems from players’ common knowledge about the evolution of their environments. The \((\sigma _{t'}\otimes \chi _{t'}|t'=1,2,\ldots ,t-1)\) portion of the history and the present information \(\sigma _t\), both about other players, are determinable by (10) before the game is even played out.

For finite semi-anonymous games, however, information is gradually revealed and its perfection is not guaranteed. We can define space \(O_S\) and map \(\tilde{o}_S:\mathcal{P}(S)\rightarrow O_S\) to represent a player’s observatory power over his present pre-action environment immediately before actual play. Similarly, we can define space \(O_{SX}\) and map \(\tilde{o}_{SX}:\mathcal{P}(S\times X)\rightarrow O_{SX}\) to represent his observatory power over the in-action environment just experienced. So that new information does not contradict old information and no information gets lost, we suppose function \(\tilde{o}_S^{\;SX}:O_{SX}\rightarrow O_S\) exists, with \(\tilde{o}_S^{\;SX}(\tilde{o}_{SX}(\tau ))=\tilde{o}_S(\tau |_S)\) for any \(\tau \in \mathcal{P}(S\times X)\).

With these definitions, a player’s decision in period t can be denoted by a map \(\hat{\chi }_t: (S\times X\times O_{SX})^{t-1}\times O_S\times S\rightarrow \mathcal{P}(X)\). In the period, player 1’s random decision rule can be written as \(\hat{\chi }_t(\tilde{h}_t,\tilde{o}_S(\varepsilon _{s_{t,-1}}),s_{t1}|\cdot )\), where the history \(\tilde{h}_t\) is expressible as

$$\begin{aligned} \tilde{h}_t=(s_{t'1},x_{t'1},\tilde{o}_{SX}(\varepsilon _{s_{t',-1}x_{t',-1}})|t'=1,2,\ldots ,t-1), \end{aligned}$$
(39)

\(\tilde{o}_S(\varepsilon _{s_{t,-1}})\) is his observation of other players’ status, and \(s_{t1}\) represents the player’s own state. There is a whole spectrum in which \(O_S\) and \(\tilde{o}_S\) can reside. When \(O_S=\{0\}\) and \(\tilde{o}_S(\cdot )=0\), players are ignorant of others’ states; when \(O_S=\mathcal{P}(S)\) and \(\tilde{o}_S\) is the identity map, every player is fully aware of his surrounding. Similarly, there are varieties of \(O_{SX}\), \(\tilde{o}_{SX}\), and \(\tilde{o}_S^{\;SX}\).

Theorems 2 and 3, however, nullify the need to delve into the \((O_S,\tilde{o}_S,O_{SX},\tilde{o}_{SX},\tilde{o}_S^{\;SX})\)-related details about finite games. They state that an equilibrium of the NG counterpart, which is necessarily both oblivious of the past history \(\tilde{h}_t\) and blind to the present observation \(\tilde{o}_S(\varepsilon _{s_{t,-1}})\), serves as a good approximate equilibrium for games with enough players. The absence of \(\tilde{h}_t\) again has a Markovian explanation. On the other hand, the ability to shake off \(\tilde{o}_S(\varepsilon _{s_{t,-1}})\)’s influence is very important, since this saves players the efforts to gather information about their surroundings.

Regarding Theorem 4, we note the following. Each of our finite stationary games is a discounted stochastic game. For an n-player version of the latter game in which players have full knowledge of others’ states, equilibria are hard to compute and for their implementation, require high degrees of coordination among players; see Solan (1998). These equilibria come from the space \((2^{\mathbb {R}^n})^{S^n}\times ((\mathbb {R}^n)^{X^n\times S^n})^{S^n\times \mathbb {R}^n}\); whereas, our NG equilibria come from \(\mathbb {R}^{S\times X}\). Meanwhile, the discounted stochastic game one faces in real life is often semi-anonymous; see, e.g., examples listed in Jovanovic and Rosenthal (1988). For such a game, Theorem 4 has shown that a much easier path can be taken in order to coordinate player behavior under an \(\epsilon \)-sized compromise. If players all agree to exercise a corresponding NG equilibrium, the typical player 1 has only to respond to his own state \(s_{t1}\) without giving up too much.

9.2 Sources of NG equilibria

To further buttress the claim that studying the idealistic NGs can help with the understanding and execution of messier finite games faced in real life, we demonstrate that NG equilibria, meeting criteria (23) and (24) for the transient case and (35) and (36) for the stationary case, can be obtained relatively easily.

First, we concentrate on the transient case studied in Sects. 3 to 6. From (22),

$$\begin{aligned} v_t(s_t,(\xi _t,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]})=\int _X \xi _t(dy)\cdot v_t(s_t,(\delta _y,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]}). \end{aligned}$$
(40)

Hence,

$$\begin{aligned} \sup \limits _{\xi _t\in \mathcal{K}(S,X)}v_t(s_t,(\xi _t,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]})=\sup \limits _{y\in X}v_t(s_t,(\delta _y,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]}).\quad \end{aligned}$$
(41)

So the equilibrium criterion (23) conveniently used by us for the \(\bar{t}\)-period case is equivalent to, for every \(t=1,2,\ldots ,\bar{t}\),

$$\begin{aligned} \chi _t(s_t|\tilde{X}_t(s_t,\sigma _t,\chi _{[t\bar{t}]}))=1,\quad \forall s_t\in S, \end{aligned}$$
(42)

where

$$\begin{aligned} \tilde{X}_t(s_t,\sigma _t,\chi _{[t\bar{t}]})= & {} \{x\in X|v_t(s_t,(\delta _x,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]})\nonumber \\&\quad =\sup \limits _{y\in X}v_t(s_t,(\delta _y,\chi _{[t+1,\bar{t}]}),\sigma _t,\chi _{[t\bar{t}]})\}, \end{aligned}$$
(43)

and \(\sigma _t\) is defined through (24).

The form consisting of (42) and (43) is fairly close to the distributional-equilibrium concept used in NG literature, such as Mas-Colell (1984) and Jovanovic and Rosenthal (1988). A distributional equilibrium is an in-action environment sequence \(\tau _{[1\bar{t}]}=(\tau _t|t=1,2,\ldots ,\bar{t})\in (\mathcal{P}(S\times X))^{\bar{t}}\) which satisfies \(\tau _t(\tilde{U}_t(\tau _{[t\bar{t}]}))=1\) for each \(t=1,2,\ldots ,\bar{t}\). Here, \(\tilde{U}_t(\tau _{[t\bar{t}]})= \{(s,x)\in S\times X|v'_t(s,x,\tau _{[t\bar{t}]})=\sup \nolimits _{y\in X}v'_t(s,y,\tau _{[t\bar{t}]})\}\), and \(v'_t(s,y,\tau _{[t\bar{t}]})\) is a player’s payoff when he starts period t with state s and action y, but other players in all periods and he himself in later periods act according to \(\tau _{[t\bar{t}]}\); corresponding to (24), the distributional equilibrium also satisfies \(\tau _1|_S=\sigma _1\) and \(\tau _t|_S=\tau _{t-1}\odot \tilde{g}_{t-1}(\cdot ,\cdot ,\tau _{t-1})\) for \(t=2,3,\ldots ,\bar{t}\). According to Jovanovic and Rosenthal (1988, Theorem 1), such an equilibrium \(\tau _{[1\bar{t}]}\) would exist when S and X are compact, each payoff \(\tilde{f}_t\) is bounded and continuous in all arguments, and each transition kernel \(\tilde{g}_t\) is continuous in all arguments.

When an equilibrium \(\chi _{[1\bar{t}]}\) in our conditional sense exists, we can construct a distributional equilibrium \(\tau _{[1\bar{t}]}\) by resorting iteratively to \(\tau _t=\sigma _t\otimes \chi _t\) and \(\sigma _{t+1}=T_t(\chi _t)\circ \sigma _t\) for \(t=1,2,\ldots ,\bar{t}\). Conversely, when the latter distributional equilibrium \(\tau _{[1\bar{t}]}\) is available, we can nearly get a conditional equilibrium \(\chi _{[1\bar{t}]}\) back. For each \(t=1,2,\ldots ,\bar{t}\), according to Duffie et al. (1994) (p. 751), we can identify a \(\chi _t\in \mathcal{K}(S,X)\), which also passes as a measurable map from S to \(\mathcal{P}(X)\), that satisfies \(\tau _t=\tau _t|_S\otimes \chi _t\). Thus, we will be able to construct \(\chi _{[1\bar{t}]}\) consecutively from \(\chi _1\) up to \(\chi _{\bar{t}}\). But even then, \(\chi _{[t\bar{t}]}\) along with \(\sigma _t=\tau _t|_S\) would satisfy (42) only for \(\tau _t|_S\)-almost every \(s_t\), but not necessarily every \(s_t\in S\). For instance, we can suppose \(S=\{\bar{s}_1,\bar{s}_2,\ldots \}\). At each t, the constructed \(\chi _{[1\bar{t}]}\) could guarantee (42) for those \(\bar{s}_i\)’s with \((\tau _t|_S)(\bar{s}_i)>0\) but not those with \((\tau _t|_S)(\bar{s}_i)=0\). On the other hand, a conditional equilibrium \(\chi _{[1\bar{t}]}\) can be obtained directly; see section “The transient case” in Appendix 6 for details.

When it comes to the stationary case examined in Sect. 8, we make parallel developments. Here the property corresponding to (36) is

$$\begin{aligned} \chi (s|\tilde{X}_\infty (s,\sigma ,\chi ))=1,\quad \forall s\in S, \end{aligned}$$
(44)

where

$$\begin{aligned} \tilde{X}_\infty (s,\sigma ,\chi )=\left\{ x\in X|v^\infty (s,(\delta _x,\chi ^\infty ),\sigma ,\chi ^\infty )=\sup \limits _{y\in X}v^\infty (s,(\delta _y,\chi ^\infty ),\sigma ,\chi ^\infty )\right\} ,\nonumber \\ \end{aligned}$$
(45)

and \(\sigma \) satisfies (35). Again, the existence of a related distributional equilibrium \(\tau \in \mathcal{P}(S\times X)\) is known under quite general conditions; see, e.g., Jovanovic and Rosenthal (1988, Theorem 2). However, an equilibrium \(\tau \) does not exactly lead to a conditional equilibrium \(\chi \). So once more we focus on a direct approach for the stationary case; see section “The stationary case” in Appendix 6.

10 Concluding remarks

Under a common action plan, we have shown that environments faced by players in multi-period large finite games would stay close to those of their NG counterparts. For transient and stationary settings, our results reveal that an NG equilibrium, necessarily both oblivious of past history and blind to present status of other players, could serve as a good approximate equilibrium in large finite games. We reckon that the discreteness requirement on both the state and action spaces can be frustrating in some circumstances. Besides the relaxation of the aforementioned restriction, future research can also look into the issue of converge rate.