1 Introduction

Approximate Nash equilibria. Nash equilibria are the central solution concept in game theory. Since it is known that computing an exact Nash equilibrium [6, 11] is unlikely to be achievable in polynomial time, a line of work has arisen that studies the computational aspects of approximate Nash equilibria. The most widely studied notion is of an \(\epsilon \) -approximate Nash equilibrium (\(\epsilon \)-Nash), which requires that all players have an expected payoff that is within \(\epsilon \) of a best response. This is an additive notion of approximate equilibrium; the problem of computing approximate equilibria of bimatrix games using a relative notion of approximation is known to be \(\mathtt {PPAD}\)-hard even for constant approximations [10].

So far, \(\epsilon \)-Nash equilibria have mainly been studied in the context of two-player bimatrix games. A line of work [3, 12, 13] has investigated the best \(\epsilon \) that can be guaranteed in polynomial time for bimatrix games. The current best result, due to Tsaknakis and Spirakis [28], is a polynomial-time algorithm that finds a 0.3393-Nash equilibrium of a bimatrix game with all payoffs in [0, 1].

In this paper, we study \(\epsilon \)-Nash equilibria in the context of many-player games, a topic that has received much less attention. A simple approximation algorithm for many-player games can be obtained by generalising the algorithm of Daskalakis et al. [13] from the two-player setting to the n-player setting, which provides a guarantee of \(\epsilon = 1-\frac{1}{n}\). This has since been improved independently by three sets of authors [3, 4, 22]. They provide a method that converts a polynomial-time algorithm for finding \(\epsilon \)-Nash equilibria in \((n-1)\)-player games into an algorithm that finds a \(\frac{1}{2-\epsilon }\)-Nash equilibrium in n-player games. Using the polynomial-time 0.3393 algorithm of Tsaknakis and Spirakis [28] for 2-player games as the base case for this recursion, this allows us to provide polynomial-time algorithms with approximation guarantees of 0.6022 in 3-player games, and 0.7153 in 4-player games. These guarantees tend to 1 as n increases, and so far, no constant \(\epsilon <1\) is known such that, for all n, an \(\epsilon \)-Nash equilibrium of an n-player game can be computed in polynomial time.

For n-player games, we have lower bounds for \(\epsilon \)-Nash equilibria. More precisely, Rubinstein has shown that when n is not a constant there exists a constant but very small \(\epsilon \) such that it is \(\mathtt {PPAD}\)-hard to compute an \(\epsilon \)-Nash equilibrium [27]. This is quite different from the bimatrix game setting, where the existence of a quasi-polynomial time approximation scheme rules out such a lower bound, unless all of \(\mathtt {PPAD}\) can be solved in quasi-polynomial time [26] can be solved in quasi-polynomial time [26].

Polymatrix games. In this paper, we focus on a particular class of many-player games called polymatrix games. In a polymatrix game, the interaction between the players is specified by an n vertex graph, where each vertex represents one of the players. Each edge of the graph specifies a bimatrix game that will be played by the two respective players, and thus a player with degree d will plays d bimatrix games simultaneously. More precisely, each player picks a strategy, and then plays this strategy in all of the bimatrix games that he is involved in. His payoff is then the sum of the payoffs that he obtains in each of the games.

Polymatrix games are a class of succinctly represented n-player games: a polymatrix game is specified by at most \(n^2\) bimatrix games, each of which can be written down in quadratic space with respect to the number of strategies. This is unlike general n-player strategic form games, which require a representation that is exponential in the number of players.

The problem of computing exact Nash equilibria in polymatrix games can be tackled in exponential time by Lemke’s algorithm [24]. For the special subclass of generalized zero sum games on networks it was proved by Cai and Daskalakis [5] that this problem can be solved in polynomial time. On the other hand, there has been relatively little work on approximation algorithms for polymatrix games. The approximation algorithms for general games can be applied in this setting in an obvious way, but to the best of our knowledge there have been no upper bounds that are specific to polymatrix games. On the other hand, the lower bound of Rubinstein mentioned above is actually proved by constructing polymatrix games. Thus, there is a constant but very small \(\epsilon \) such that it is \(\mathtt {PPAD}\)-hard to compute an \(\epsilon \)-Nash equilibrium [27], and this again indicates that approximating equilibria in polymatrix games is quite different to approximating equilibria in bimatrix games.

Our contribution. Our main result is an algorithm that, for every \(\delta \) in the range \(0 < \delta \le 0.5\), finds a \((0.5 + \delta )\)-Nash equilibrium of a polymatrix game in time polynomial in the input size and \(\frac{1}{\delta }\). Note that our approximation guarantee does not depend on the number of players, which is a property that was not previously known to be achievable for polymatrix games, and still cannot be achieved for general strategic form games.

We prove this result by adapting the algorithm of Tsaknakis and Spirakis [28] (henceforth referred to as the TS algorithm). They give a gradient descent algorithm for finding a 0.3393-Nash equilibrium in a bimatrix game. We generalise their gradient descent techniques to the polymatrix setting, and show that it always arrives at a \((0.5 + \delta )\)-Nash equilibrium after a polynomial number of iterations.

In order to generalise the TS algorithm, we had to overcome several issues. Firstly, the TS algorithm makes the regrets of the two players equal in every iteration, but there is no obvious way to achieve this in the polymatrix setting. Instead, we show how gradient descent can be applied to a strategy profile where the regrets are not necessarily equal. Secondly, the output of the TS algorithm is either a point found by gradient descent, or a point obtained by modifying the result of gradient descent. In the polymatrix game setting, it is not immediately obvious how such a modification can be derived with a non-constant number of players (without an exponential blowup). Thus we apply a different analysis, which proves that the point resulting from gradient descent always has our approximation guarantee. It is an interesting open question whether a better approximation guarantee can be achieved when there is a constant number of players.

An interesting feature of our algorithm is that it can be applied even when players have differing degrees. Originally, polymatrix games were defined only for complete graphs [24]. Since previous work has only considered lower bounds for polymatrix games, it has been sufficient to restrict attention to regular graphs, as in work Rubinstein [27]. However, since this paper is proving an upper bound, we must be more careful. As it turns out, our algorithm will efficiently find a \((0.5+\delta )\)-Nash equilibrium for all \(\delta >0\), no matter what graph structure the polymatrix game has.

Finally, we show that our algorithm can be applied to two-player Bayesian games. In a two-player Bayesian game, each player is assigned a type according to a publicly known probability distribution. Each player knows their own type, but does not know the type of their opponent. Rosenthal and Howson showed that the problem of finding an exact equilibrium in a two-player Bayesian game is equivalent to finding an exact equilibrium in a polymatrix game [23]. We show that this correspondence also holds for approximate equilibria: finding an \(\epsilon \)-Nash equilibrium in these games can be reduced to the problem of finding an \(\epsilon \)-Nash equilibrium in a polymatrix game, and therefore, our algorithm can be used to efficiently find a \((0.5+\delta )\)-Nash equilibrium of a two-player Bayesian game.

Related work. An FPTAS for the problem of computing an \(\epsilon \)-Nash equilibrium of a bimatrix game does not exist unless every problem in \(\mathtt {PPAD}\) can be solved in polynomial time [6]. Arguably, the biggest open question in equilibrium computation is whether there exists a PTAS for this problem. As we have mentioned, for any constant \(\epsilon >0\), there does exist a quasi-polynomial-time algorithm for computing an \(\epsilon \)-Nash equilibrium of a bimatrix game, or any game with a constant number of players [2, 26], with running time \(k^{O(\log k)}\) for a \(k \times k\) bimatrix game. Consequently, in contrast to the many-player case, it is not believed that there exists a constant \(\epsilon \) such that the problem of computing an \(\epsilon \)-Nash equilibrium of a bimatrix game (or any game with a constant number of players) is \(\mathtt {PPAD}\)-hard, since it seems unlikely that all problems in \(\mathtt {PPAD}\) have quasi-polynomial-time algorithms. On the other hand, for multi-player games, as mentioned above, there is a small constant \(\epsilon \) such that it is \(\mathtt {PPAD}\)-hard to compute an \(\epsilon \)-Nash equilibrium of an n-player game when n is not constant. One positive result we do have for multi-player games is that there is a PTAS for anonymous games (where the identity of players does not matter) when the number of strategies is constant [14].

Polymatrix games have played a central role in the reductions that have been used to show \(\mathtt {PPAD}\)-hardness of games and other equilibrium problems [6, 7, 11, 16, 19]. Computing an exact Nash equilibrium in a polymatrix game is \(\mathtt {PPAD}\)-hard even when all the bimatrix games played are either zero-sum games or coordination games [5]. Polymatrix games have been used in other contexts too. For example, Govindan and Wilson proposed a (non-polynomial-time) algorithm for computing Nash equilibria of an n-player game, by approximating the game with a sequence of polymatrix games [20]. Later, they presented a (non-polynomial) reduction that reduces n-player games to polymatrix games while preserving approximate Nash equilibria [21]. Their reduction introduces a central coordinator player, who interacts bilaterally with every player.

For Bayesian two player games, Conitzer and Sandholm [8] proved that determining whether a given two-player game has a pure Bayesian Nash equilibrium (BNE) is NP-complete. Austrin et al. [1] extended this hardness result to approximate pure BNE. More specifically, they proved that given that a Bayesian game admits a pure BNE it is NP-hard to compute a pure \(\epsilon \)-BNE for \(\epsilon =0.004\). Moreover, for the special case where the distribution over the types of the players is uniform they provided a quasi polynomial algorithm for computing an \(\epsilon \) pure BNE, for any \(\epsilon > 0\). Finally, Rubinstein [27] proved that there is a (very small) constant \(\epsilon \) such that it is \(\mathtt {PPAD}\)-hard to compute any \(\epsilon \)-BNE of a Bayesian two player game. Our main result is the first non-trivial upper bound on the approximation guarantee for this problem that can be achieved in polynomial time.

2 Preliminaries

We start by fixing some notation. We use [k] to denote the set of integers \(\{1, 2, \ldots , k\}\), and when a universe [k] is clear, we will use \(\bar{S} = \{ i \in [k], i \notin S\}\) to denote the complement of \(S \subseteq [k]\). For a k-dimensional vector x, we use \(x_{-S}\) to denote the elements of x with indices \(\bar{S}\), and in the case where \(S = \{i\}\) has only one element, we simply write \(x_{-i}\) for \(x_{-S}\).

Polymatrix games. An n-player polymatrix game is defined by an undirected graph (VE) with n vertices, where every vertex corresponds to a player. The edges of the graph specify which players interact with each other. For each \(i \in [n]\), we use \(N(i) = \{j \; : \; (i,j) \in E\}\) to denote the neighbors of player i.

Each edge \((i, j) \in E\) specifies that a bimatrix game will be played between players i and j. Each player \(i \in [n]\) has a fixed number of pure strategies \(m_i\), and the bimatrix game on edge \((i, j) \in E\) will therefore be specified by an \(m_i \times m_j\) matrix \(A_{ij}\), which gives the payoffs for player i, and an \(m_j \times m_i\) matrix \(A_{ji}\), which gives the payoffs for player j. We allow the individual payoffs in each matrix to be an arbitrary (even negative) rational number. As we describe in the next subsection, we will rescale these payoffs so that the overall payoff to each player lies in the range [0, 1].

2.1 Payoff Normalisation

Before we continue, we must first discuss how the payoffs in the game are rescaled. It is common, when proving results about additive notions of approximate equilibria, to rescale the payoffs of the game. This is necessary in order for different results to be comparable. For example, all results about additive approximate equilibria in bimatrix games assume that the payoff matrices have entries in the range [0, 1], and therefore an \(\epsilon \)-Nash equilibrium always has a consistent meaning. For the same reason, we must rescale the payoffs in a polymatrix in order to give a consistent meaning to an \(\epsilon \)-approximation.

An initial, naive, approach would be to specify that each of the individual bimatrix games has entries in the range [0, 1]. This would be sufficient if we were only interested in polymatrix games played on either complete graphs or regular graphs. However, in this model, if the players have differing degrees, then they also have differing maximum payoffs. This means that an additive approximate equilibrium must pay more attention to high degree players, as they can have larger regrets.

One solution to this problem, which was adopted in the conference version of this paper [15], is to rescale according to the degree. That is, given a polymatrix game where each bimatrix game has payoffs in the range [0, 1], if a player has degree d, then each of his payoff matrices is divided by d. This transformation ensures that every player has regret in the range [0, 1], and therefore low degree players are not treated unfairly by additive approximations.

However, rescaling according to the degree assumes that each bimatrix game actually uses the full range of payoffs in [0, 1]. In particular, some bimatrix games may have minimum payoff strictly greater than 0, or maximum payoff strictly less than 1. This issue arises, in particular, in our application of two-player Bayesian games. Note that, unlike the case of a single bimatrix game, we cannot fix this by rescaling individual bimatrix games in a polymatrix game, because we must preserve the relationship between the payoffs in all of the bimatrix games that a player is involved in.

To address this, we will rescale the games so that, for each player, the minimum possible payoff is 0, and the maximum possible payoff is 1. For each player i, we denote by \({U} _i\) the maximum payoff he can obtain, and by \({L} _i\) the minimum payoff he can obtain. Formally:

$$\begin{aligned} {U} _i&:= \max _{p \in [m_i]} \left( \sum _{j \in N(i)} \max _{q \in [m_j]} \big ( A_{ij}(p,q) \big ) \right) ,\\ {L} _i&:= \min _{p \in [m_i]} \left( \sum _{j \in N(i)} \min _{q \in [m_j]} \big ( A_{ij}(p,q) \big ) \right) . \end{aligned}$$

Then, for all i and all \(j \in N(i) \) we will apply the following transformation, which we call \(T(\cdot )\), to all the entries z of payoff matrices \(A_{ij}\):

$$\begin{aligned} T_i(z) = \frac{1}{{U} _i - {L} _i} \cdot \left( z - \frac{{L} _i}{{d(i)}} \right) . \end{aligned}$$

Observe that, since player i’s payoff is the sum of \({d(i)} \) many bimatrix games, it must be the case that after transforming the payoff matrices in this way, player i’s maximum possible payoff is 1, and player i’s minimum possible payoff is 0. For the rest of this paper, we will assume that the payoff matrices given by \(A_{ij}\) are rescaled in this way.

2.2 Approximate Nash Equilibria

Strategies. A mixed strategy for player i is a probability distribution over player i’s pure strategies. Formally, for each positive integer k, we denote the \((k-1)\)-dimensional simplex by \({\varDelta }_k := \{x: x \in \mathbb {R}^k, x \ge 0, \sum _{i=1}^k x_i = 1\}\), and therefore the set of strategies for player i is \({\varDelta }_{m_i}\). For each mixed strategy \(x \in {\varDelta }_m\), the support of x is defined as \(\mathrm {supp}(x) := \{i \in [m]: x_i \ne 0 \}\), which is the set of strategies played with positive probability by x.

A strategy profile specifies a mixed strategy for every player. We denote the set of mixed strategy profiles as \({\varDelta }:= {\varDelta }_{m_1} \times \ldots \times {\varDelta }_{m_n}\). Given a strategy profile \(\mathbf {x} = (x_1,\ldots ,x_n) \in {\varDelta }\), the payoff of player i under \(\mathbf {x}\) is the sum of the payoffs that he obtains in each of the bimatrix games that he plays. Formally, we define:

$$\begin{aligned} u_i(\mathbf {x}) := x_i^T\sum \limits _{j \in N(i)}A_{ij}x_j. \end{aligned}$$
(1)

We denote by \(u_i\left( x'_i,\mathbf {x} \right) \) the payoff for player i when he plays \(x'_i\) and the other players play according to the strategy profile \(\mathbf {x} \). In some cases the first argument will be \(x_i - x'_i\) which may not correspond to a valid strategy for player i but we still apply the equation as follows:

$$\begin{aligned} u_i\left( x_i-x'_i,\mathbf {x} \right) := x_i^T\sum \limits _{j \in N(i)}A_{ij}x_j - x_i'^T\sum \limits _{j \in N(i)}A_{ij}x_j = u_i\left( x_i,\mathbf {x} \right) - u_i\left( x'_i,\mathbf {x} \right) . \end{aligned}$$

Best responses. Let \(v_i\bigl (\mathbf {x} \bigr ) \) be the vector of payoffs for each pure strategy of player i when the rest of players play strategy profile \(\mathbf {x} \). Formally:

$$\begin{aligned} v_i\bigl (\mathbf {x} \bigr ) = \sum _{j \in N(i)} A_{ij}x_j. \end{aligned}$$

For each vector \(x \in R^m\), we define \(\mathrm {suppmax}(x)\) to be the set of indices that achieve the maximum of x, that is, we define \(\mathrm {suppmax}(x) = \{ i \in [m]: x_i \ge x_j, \forall j\in [m] \}\). Then the pure best responses of player i against a strategy profile \(\mathbf {x}\) (where only \(\mathbf {x} _{-i}\) is relevant) is given by:

$$\begin{aligned} {\mathrm {Br}_{i}(\mathbf {x})} = \mathrm {suppmax}\left( \sum _{j \in N(i)} A_{ij}x_j\right) = \mathrm {suppmax}(v_i\bigl (\mathbf {x} \bigr )). \end{aligned}$$
(2)

The corresponding best response payoff is given by:

$$\begin{aligned} u_i^*(\mathbf {x}) = \max _k \left\{ \bigl (\sum _{j \in N(i)} A_{ij}x_j\bigr )_k \right\} = \max _k \left\{ \bigl ( v_i\bigl (\mathbf {x} \bigr ) \bigr )_k \right\} . \end{aligned}$$
(3)

Equilibria. In order to define the exact and approximate equilibria of a polymatrix game, we first define the regret that is suffered by each player under a given strategy profile. The regret function \(f_i:{\varDelta }\rightarrow [0, 1]\) is defined, for each player i, as follows:

$$\begin{aligned} f_i(\mathbf {x}) := u_i^*(\mathbf {x}) - u_i(\mathbf {x}). \end{aligned}$$
(4)

The maximum regret under a strategy profile \(\mathbf {x} \) is given by the function \(f(\mathbf {x})\) where:

$$\begin{aligned} f(\mathbf {x}) := \max \{f_1(\mathbf {x}), \ldots , f_n(\mathbf {x})\}. \end{aligned}$$
(5)

We say that \(\mathbf {x} \) is an \(\epsilon \)-approximate Nash equilibrium (\(\epsilon \)-NE) if we have:

$$\begin{aligned} f(\mathbf {x}) \le \epsilon , \end{aligned}$$

and \(\mathbf {x} \) is an exact Nash equilibrium if we have \(f(\mathbf {x}) = 0\).

3 The Gradient

Our goal is to apply gradient descent to the regret function f. In this section, we formally define the gradient of f in Definition 1, and give a combinatorial version of that definition in Lemma 3. In order to show that our gradient descent method terminates after a polynomial number of iterations, we actually need to use a slightly modified version, which we describe at the end of this section in Definition 5.

Given a point \(\mathbf {x} \in {\varDelta }\), a feasible direction from \(\mathbf {x} \) is defined by any other point \(\mathbf {x} ' \in {\varDelta }\). This defines a line between \(\mathbf {x} \) and \(\mathbf {x} '\), and formally speaking, the direction of this line is \(\mathbf {x} ' - \mathbf {x} \). In order to define the gradient of this direction, we consider the function \(f((1-\epsilon )\cdot \mathbf {x} + \epsilon \cdot \mathbf {x} ') - f(\mathbf {x})\) where \(\epsilon \) lies in the range \(0 \le \epsilon \le 1\). The gradient of this direction is given in the following definition.

Definition 1

Given profiles \(\mathbf {x},\mathbf {x} ' \in {\varDelta }\) and \(\epsilon \in [0,1]\), we define:

$$\begin{aligned} Df(\mathbf {x},\mathbf {x} ', \epsilon )&:= f((1 - \epsilon ) \cdot \mathbf {x} + \epsilon \cdot \mathbf {x} ') - f(\mathbf {x}). \end{aligned}$$

Then, we define the gradient of f at \(\mathbf {x} \) in the direction \(\mathbf {x} '-\mathbf {x} \) as:

$$\begin{aligned} Df(\mathbf {x},\mathbf {x} ') = \lim \limits _{\epsilon \rightarrow 0}\frac{1}{\epsilon }Df(\mathbf {x},\mathbf {x} ', \epsilon ). \end{aligned}$$
(6)

The gradient of f at any point \(\mathbf {x} \in {\varDelta }\) along a feasible direction specified by another point \(\mathbf {x} ' \in {\varDelta }\) provides the rate of decrease, or increase, of the value of f along that direction. At any point \(\mathbf {x} \) we wish to find the direction such that f decreases with the highest rate, that is, we want to find the point \(\mathbf {x} '\) that minimizes \(Df(\mathbf {x}, \mathbf {x} ')\), and move along the direction \(\mathbf {x} ' - \mathbf {x} \), or to find that \(\mathbf {x} \) is a stationary point, i.e. \(Df(\mathbf {x}, \mathbf {x} ') \ge 0\) for all \(\mathbf {x} ' \in {\varDelta }\). Unfortunately, Eq. (6) cannot be used directly in an algorithm. Instead, in Definition 3 we provide a combinatorial version of the gradient that allows us to compute the steepest descent direction, with respect to the combinatorial gradient, via a linear program.

The intuition for the combinatorial version comes from Eq. (6). Let us define \(\bar{\mathbf {x}}:= (1 - \epsilon ) \cdot \mathbf {x} + \epsilon \cdot \mathbf {x} '\). From the natural gradient defined in Definition 1, we get that:

$$\begin{aligned} Df(\mathbf {x},\mathbf {x} ')&= \lim _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \bigl ( f(\bar{\mathbf {x}}) - f(\mathbf {x}) \bigr ) \nonumber \\&= \lim _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \left( \max _{i \in [n]}f_i(\bar{\mathbf {x}}) - f(\mathbf {x}) \right) \nonumber \\&= \max _{i \in [n]} \left( \lim _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \bigl ( f_i(\bar{\mathbf {x}}) - f(\mathbf {x}) \bigr ) \right) . \end{aligned}$$
(7)

In “Appendix 1” we study the limit \(\lim _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \bigl ( f_i(\bar{\mathbf {x}}) - f(\mathbf {x}) \bigr )\), and we prove that it is equal to the following combinatorial version. Before we state the result we introduce some useful notation. Given profiles \(\mathbf {x}\) and \(\mathbf {x} '\) let us denote:

$$\begin{aligned} Df_i(\mathbf {x}, \mathbf {x} ') = \max _{k \in {\mathrm {Br}_{i}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\mathbf {x} '\bigr ) \bigr )_k \right\} - u_i\left( x_i,\mathbf {x} '\right) + u_i\left( x_i-x'_i,\mathbf {x} \right) . \end{aligned}$$
(8)

The above expression arises from expanding \(f_i(\bar{\mathbf {x}}) - f(\mathbf {x})\). The terms above are all multiplied by \(\epsilon \) in the expansion, whereas the remaining terms all tend to zero when the limit is taken. The following lemma is proved in “Appendix 1”.

Lemma 2

Let \(\mathbf {x}\) be strategy profile and \(i \in [n]\). If \(f_i(\mathbf {x}) = f(\mathbf {x})\), then:

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \bigl ( f_i(\bar{\mathbf {x}}) - f(\mathbf {x}) \bigr ) = Df_i(\mathbf {x}, \mathbf {x} ') - f(\mathbf {x}). \end{aligned}$$

otherwise \(\lim _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \bigl ( f_i(\bar{\mathbf {x}}) - f(\mathbf {x}) \bigr ) =-\infty \).

Combining Eq. (7) with Lemma 2 gives the following combinatorial version of the gradient that we will use throughout the rest of the paper.

Definition 3

(Combinatorial gradient) The gradient of f at point \(\mathbf {x} \) along direction \(\mathbf {x} ' - \mathbf {x} \) is:

$$\begin{aligned} Df(\mathbf {x}, \mathbf {x} ') = \max _{i \in [n]} Df_i (\mathbf {x}, \mathbf {x} ') - f(\mathbf {x}). \end{aligned}$$

In order to show that our gradient descent algorithm terminates after a polynomial number of steps, we have to use a slight modification of the formula given in Definition 3. More precisely, in \(Df_i(\mathbf {x}, \mathbf {x} ')\), we need to take the maximum over the \(\delta \)-best responses, rather than the best responses.

We begin by providing the definition of the \(\delta \)-best responses.

Definition 4

(\(\delta \) -best response) Let \(\mathbf {x} \in {\varDelta }\), and let \(\delta \in (0, 0.5]\). The \(\delta \)-best response set \({\mathrm {Br^{\delta }_{i}}(\mathbf {x})} \) for player \(i \in [n]\) is defined as:

$$\begin{aligned} {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}:= \left\{ j \in [m_i] : \bigl ( v_i\bigl (\mathbf {x} \bigr ) \bigr )_j \ge u_i^*(\mathbf {x}) - \delta \right\} . \end{aligned}$$

We now define the function \(Df^{\delta }_i(\mathbf {x},\mathbf {x} ')\).

Definition 5

Let \(\mathbf {x}, \mathbf {x} ' \in {\varDelta }\), let \(\epsilon \in [0, 1]\), and let \(\delta \in (0, 0.5]\). We define \(Df^{\delta }_i(\mathbf {x},\mathbf {x} ')\) as:

$$\begin{aligned} Df^{\delta }_i(\mathbf {x},\mathbf {x} ') := \max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\mathbf {x} '\bigr ) \bigr )_k \right\} - u_i\left( x_i,\mathbf {x} '\right) - u_i\left( x'_i,\mathbf {x} \right) + u_i\left( x_i,\mathbf {x} \right) . \end{aligned}$$
(9)

Furthermore, we define \(Df^{\delta }(\mathbf {x}, \mathbf {x} ')\) as:

$$\begin{aligned} Df^{\delta }(\mathbf {x}, \mathbf {x} ') = \max _{i \in [n]} Df^{\delta }_i (\mathbf {x}, \mathbf {x} ') - f(\mathbf {x}). \end{aligned}$$
(10)

Our algorithm works by performing gradient descent using the function \(Df^{\delta }\) as the gradient. Obviously, this is a different function to Df, and so we are not actually performing gradient descent on the gradient of f. It is important to note that all of our proofs are in terms of \(Df^{\delta }\), and so this does not affect the correctness of our algorithm. We proved Lemma 2 in order to explain where our definition of the combinatorial gradient comes from, but the correctness of our algorithm does not depend on the correctness of Lemma 2.

4 The Algorithm

In this section, we describe our algorithm for finding a \((0.5+\delta )\)-Nash equilibrium in a polymatrix game by gradient descent. In each iteration of the algorithm, we must find the direction of steepest descent with respect to \(Df^{\delta }\). We show that this task can be achieved by solving a linear program, and we then use this LP to formally specify our algorithm.

The direction of steepest descent. We show that the direction of steepest descent can be found by solving a linear program. Our goal is, for a given strategy profile \(\mathbf {x} \), to find another strategy profile \(\mathbf {x} '\) so as to minimize the gradient \(Df^{\delta }(\mathbf {x}, \mathbf {x} ')\). Recall that \(Df^{\delta }\) is defined in Eq. (10) to be:

$$\begin{aligned} Df^{\delta }(\mathbf {x}, \mathbf {x} ') = \max _{i \in [n]} Df^{\delta }_i (\mathbf {x}, \mathbf {x} ') - f(\mathbf {x}). \end{aligned}$$

Note that the term \(f(\mathbf {x})\) is a constant in this expression, because it is the same for all directions \(\mathbf {x} '\). Thus, it is sufficient to formulate a linear program in order to find the \(\mathbf {x} '\) that minimizes \(\max _{i \in [n]} Df^{\delta }_i (\mathbf {x}, \mathbf {x} ')\). Using the definition of \(Df^{\delta }_i\) in Eq. (9), we can do this as follows.

Definition 6

(Steepest descent linear program) Given a strategy profile \(\mathbf {x} \), the steepest descent linear program is defined as follows. Find \(\mathbf {x} ' \in {\varDelta }\), \(l_1,l_2,\ldots ,l_n\), and w such that:

The \(l_i\) variables deal with the maximum in the term \(\max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \bigl \{ \bigl ( v_i\bigl (\mathbf {x} '\bigr ) \bigr )_k \bigr \}\), while the variable w is used to deal with the maximum over the functions \(Df^{\delta }_i\). Since the constraints of the linear program correspond precisely to the definition of \(Df^{\delta }\), it is clear that, when we minimize w, the resulting \(\mathbf {x} '\) specifies the direction of steepest descent. For each profile \(\mathbf {x} \), we define \(Q(\mathbf {x})\) to be the direction \(\mathbf {x} '\) found by the steepest descent LP for \(\mathbf {x} \).

Once we have found the direction of steepest descent, we then need to move in that direction. More precisely, we fix a parameter \(\epsilon = \frac{\delta }{\delta + 2}\) which is used to determine how far we move in the steepest descent direction. We derive this value for \(\epsilon \) in Lemma 24 in “Appendix 2”. The choice of this value for \(\epsilon \) ensures that in every iteration of our algorithm the value of f is decreasing and moreover, as we will show in Sect. 6, leads to a polynomial bound on the running time of our algorithm.

The algorithm. We can now formally describe our algorithm. The algorithm takes a parameter \( \delta \in (0, 0.5]\), which will be used as a tradeoff between running time and the quality of approximation.

figure a

A single iteration of this algorithm corresponds to executing steps 2, 3, and 4. Since this only involves solving a single linear program, it is clear that each iteration can be completed in polynomial time.

The rest of this paper is dedicated to showing the following theorem, which is our main result.

Theorem 7

Algorithm 1 finds a \((0.5+\delta )\)-NE after at most \(O(\frac{1}{\delta ^2})\) iterations.

To prove Theorem 7, we will show two properties. Firstly, in Sect. 5, we show that our gradient descent algorithm never gets stuck in a stationary point before it finds a \((0.5 + \delta )\)-NE. To do so, we define the notion of a \(\delta \) -stationary point, and we show that every \(\delta \)-stationary point is at least a \((0.5 + \delta )\)-NE, which then directly implies that the gradient descent algorithm will not get stuck before it finds a \((0.5 + \delta )\)-NE.

Secondly, in Sect. 6, we prove the upper bound on the number of iterations. To do this we show that, if an iteration of the algorithm starts at a point that is not a \(\delta \)-stationary point, then that iteration will make a large enough amount of progress. This then allows us to show that the algorithm will find a \((0.5 + \delta )\)-NE after \(O(\frac{1}{\delta ^2})\) many iterations, and therefore the overall running time of the algorithm is polynomial.

5 Stationary Points

Recall that Definition 6 gives a linear program for finding the direction \(\mathbf {x} '\) that minimises \(Df^{\delta }(\mathbf {x},\mathbf {x} ')\). Our steepest descent procedure is able to make progress whenever this gradient is negative, and so a stationary point is any point \(\mathbf {x} \) for which \(Df^{\delta }(\mathbf {x}, \mathbf {x} ') \ge 0\). In fact, our analysis requires us to consider \(\delta \)-stationary points, which we now define.

Definition 8

(\(\delta \) -stationary point) Let \(\mathbf {x} ^*\) be a mixed strategy profile, and let \(\delta > 0\). We have that \(\mathbf {x} ^*\) is a \(\delta \)-stationary point if for all \(\mathbf {x} ' \in {\varDelta }\):

$$\begin{aligned} Df^{\delta }(\mathbf {x} ^*, \mathbf {x} ') \ge - \delta . \end{aligned}$$

We now show that every \(\delta \)-stationary point of \(f(\mathbf {x})\) is a \((0.5+\delta )\)-NE. Recall from Definition 5 that:

$$\begin{aligned} Df^{\delta }(\mathbf {x}, \mathbf {x} ') = \max _{i \in [n]} Df^{\delta }_i (\mathbf {x}, \mathbf {x} ') - f(\mathbf {x}). \end{aligned}$$

Therefore, if \(\mathbf {x} ^*\) is a \(\delta \)-stationary point, we must have, for every direction \(\mathbf {x} '\):

$$\begin{aligned} f(\mathbf {x} ^*) \le \max _{i \in [n]} Df^{\delta }_i (\mathbf {x} ^*, \mathbf {x} ') + \delta . \end{aligned}$$
(11)

Since \(f(\mathbf {x} ^*)\) is the maximum regret under the strategy profile \(\mathbf {x} ^*\), in order to show that \(\mathbf {x} ^*\) is a \((0.5 + \delta )\)-NE, we only have to find some direction \(\mathbf {x} '\) such that \(\max _{i \in [n]} Df^{\delta }_i (\mathbf {x} ^*, \mathbf {x} ') \le 0.5\). We do this in the following lemma.

Lemma 9

For every point \(\mathbf {x} \), there exists a direction \(\mathbf {x} '\) such that:

$$\begin{aligned} \max _{i \in [n]} Df^{\delta }_i (\mathbf {x}, \mathbf {x} ') \le 0.5. \end{aligned}$$

Proof

First, define \(\bar{\mathbf {x}}\) to be a strategy profile in which each player \(i \in [n]\) plays a best response against \(\mathbf {x} \). We will set \(\mathbf {x} ' = \frac{\bar{\mathbf {x}}+\mathbf {x}}{2}\). Then for each \(i \in [n]\), we have that \(Df^{\delta }_i(\mathbf {x}, \mathbf {x} ')\), is less than or equal to:

$$\begin{aligned}&\max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\frac{\bar{\mathbf {x}}+\mathbf {x}}{2}\bigr ) \bigr )_k \right\} - u_i\left( x_i,\frac{\bar{\mathbf {x}}+\mathbf {x}}{2}\right) - u_i\left( \frac{\bar{x_i}+x_i}{2},\mathbf {x} \right) + u_i\left( x_i,\mathbf {x} \right) \\&\quad = \frac{1}{2} \cdot \max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\bar{\mathbf {x}}+\mathbf {x} \bigr ) \bigr )_k \right\} - \frac{1}{2} \cdot u_i\left( x_i,\bar{\mathbf {x}}\right) - \frac{1}{2} \cdot u_i\left( \bar{x_i},\mathbf {x} \right) \\&\quad \le \frac{1}{2} \cdot \left( \max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\bar{\mathbf {x}}\bigr ) \bigr )_k \right\} + \max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\mathbf {x} \bigr ) \bigr )_k \right\} - u_i\left( x_i,\bar{\mathbf {x}}\right) - u_i\left( \bar{x_i},\mathbf {x} \right) \right) \\&\quad = \frac{1}{2} \cdot \left( \max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\bar{\mathbf {x}}\bigr ) \bigr )_k \right\} - u_i\left( x_i,\bar{\mathbf {x}}\right) \right) \quad \text {because }\bar{x_i}\text { is a b.r. to} x\\&\quad \le \frac{1}{2} \cdot \max _{k \in {\mathrm {Br^{\delta }_{i}}(\mathbf {x})}} \left\{ \bigl ( v_i\bigl (\bar{\mathbf {x}}\bigr ) \bigr )_k \right\} \\&\quad \le \frac{1}{2}. \end{aligned}$$

Thus, the point \(\mathbf {x} '\) satisfies \(\max _{i \in [n]} Df^{\delta }_i (\mathbf {x}, \mathbf {x} ') \le 0.5\). \(\square \)

We can sum up the results of the section in the following lemma.

Lemma 10

Every \(\delta \)-stationary point \(\mathbf {x} ^*\) is a \((0.5 + \delta )\)-Nash equilibrium.

6 The Time Complexity of the Algorithm

In this section, we show that Algorithm 1 terminates after a polynomial number of iterations. Let \(\mathbf {x} \) be a strategy profile that is considered by Algorithm 1, and let \(\mathbf {x} ' = Q(\mathbf {x})\) be the solution of the steepest descent LP for \(\mathbf {x} \). These two profiles will be fixed throughout this section.

We begin by proving a technical lemma that will be crucial for showing our bound on the number of iterations. To simplify our notation, throughout this section we define \(f_{new} := f(\mathbf {x} + \epsilon (\mathbf {x} ' - \mathbf {x}))\) and \(f := f(\mathbf {x})\). Furthermore, we define \(\mathcal {D} = \max _{i \in [n]} Df^{\delta }_i(\mathbf {x}, \mathbf {x} ')\). The following lemma, which is proved in “Appendix 2”, gives a relationship between f and \(f_{new}\).

Lemma 11

In every iteration of Algorithm 1 we have:

$$\begin{aligned} f_{new} - f \le \epsilon ( \mathcal {D} - f) + \epsilon ^2 (1 - \mathcal {D}). \end{aligned}$$
(12)

In the next lemma we prove that, if we are not in a \(\delta \)-stationary point, then we have a bound on the amount of progress made in each iteration. We use this in order to bound the number of iterations needed before we reach a point \(\mathbf {x}\) where \(f(\mathbf {x}) \le 0.5 + \delta \).

Lemma 12

Fix \(\epsilon = \frac{\delta }{\delta + 2}\), where \(0 < \delta \le 0.5\). Either \(\mathbf {x}\) is a \(\delta \)-stationary point or:

$$\begin{aligned} f_{new} \le \left( 1 - \left( \frac{\delta }{\delta + 2} \right) ^2 \right) f. \end{aligned}$$
(13)

Proof

Recall that by Lemma 11 the gain in every iteration of the steepest descent is:

$$\begin{aligned} f_{new} - f \le \epsilon (\mathcal {D} - f) + \epsilon ^2(1 - \mathcal {D}). \end{aligned}$$
(14)

We consider the following two cases:

  1. (a)

    \(\mathcal {D} - f > -\delta \). Then, by definition, we are in a \(\delta \)-stationary point.

  2. (b)

    \(\mathcal {D} - f \le -\delta \). We have set \(\epsilon = \frac{\delta }{\delta + 2}\). If we solve for \(\delta \) we get that \(\delta = \frac{2\epsilon }{1-\epsilon }\). Since \(\mathcal {D} - f \le -\delta \), we have that \((\mathcal {D} - f)(1-\epsilon ) \le -2\epsilon \). Thus we have:

    $$\begin{aligned} (\mathcal {D} - f)(\epsilon - 1)&\ge 2\epsilon \\ 0&\ge (\mathcal {D} - f)(1 - \epsilon ) + 2\epsilon \\ 0&\ge (\mathcal {D} - f) + \epsilon (2 - \mathcal {D} + f) \\ -\epsilon f - \epsilon&\ge (\mathcal {D} - f) + \epsilon (1- \mathcal {D}) \qquad (\epsilon \ge 0)\\ -\epsilon ^2 f - \epsilon ^2&\ge \epsilon (\mathcal {D} - f) + \epsilon ^2(1- \mathcal {D}). \end{aligned}$$

    Thus, since \(\epsilon ^2 \ge 0\) we get:

    $$\begin{aligned} -\epsilon ^2 f&\ge \epsilon (\mathcal {D} - f) + \epsilon ^2(1- \mathcal {D})\\&\ge f_{new} - f&\text {According to (14).} \end{aligned}$$

    Thus we have shown that:

    $$\begin{aligned} f_{new} - f \le&-\epsilon ^2 f \\ f_{new} \le&(1-\epsilon ^2)f. \end{aligned}$$

    Finally, using the fact that \(\epsilon = \frac{\delta }{\delta + 2}\), we get that

    $$\begin{aligned} f_{new} \le&\left( 1 - \left( \frac{\delta }{\delta + 2} \right) ^2 \right) f. \end{aligned}$$

\(\square \)

So, when the algorithm has not reached yet a \(\delta \)-stationary point, there is a decrease on the value of f that is at least as large as the bound specified in (13) in every iteration of the gradient descent procedure. In the following lemma we prove that after \(O(\frac{1}{\delta ^2})\) iterations of the steepest descent procedure the algorithm finds a point \(\mathbf {x}\) where \(f(\mathbf {x})\le 0.5 + \delta \).

Lemma 13

After \(O(\frac{1}{\delta ^2})\) iterations of the steepest descent procedure the algorithm finds a point \(\mathbf {x}\) where \(f(\mathbf {x}) \le 0.5 + \delta \).

Proof

Let \(\mathbf {x} _1\), \(\mathbf {x} _2\), \(\dots \), \(\mathbf {x} _k\) be the sequence of strategy profiles that are considered by Algorithm 1. Since the algorithm terminates as soon as it finds a \((0.5 + \delta )\)-NE, we have \(f(\mathbf {x} _i) > 0.5 + \delta \) for every \(i < k\). Therefore, for each \(i < k\) we we can apply Lemma 10 to argue that \(\mathbf {x} _i\) is not a \(\delta \)-stationary point, which then allows us to apply Lemma 12 to obtain:

$$\begin{aligned} f(\mathbf {x} _{i+1}) \le \left( 1 - \left( \frac{\delta }{\delta + 2} \right) ^2 \right) f(\mathbf {x} _i). \end{aligned}$$

So, the amount of progress made by the algorithm in iteration i is:

$$\begin{aligned} f(\mathbf {x} _i) - f(\mathbf {x} _{i+1})&\ge f(\mathbf {x} _i) - \left( 1 - \left( \frac{\delta }{\delta + 2} \right) ^2 \right) f(\mathbf {x} _i) \\&= \left( \frac{\delta }{\delta + 2} \right) ^2 f(\mathbf {x} _i) \\&\ge \left( \frac{\delta }{\delta + 2} \right) ^2 \cdot 0.5. \end{aligned}$$

Thus, each iteration of the algorithm decreases the regret by at least \((\frac{\delta }{\delta + 2})^2 \cdot 0.5\). The algorithm starts at a point \(\mathbf {x} _1\) with \(f(\mathbf {x} _1) \le 1\), and terminates when it reaches a point \(\mathbf {x} _k\) with \(f(\mathbf {x} _k) \le 0.5 + \delta \). Thus the total amount of progress made over all iterations of the algorithm can be at most \(1 - (0.5 + \delta )\). Therefore, the number of iterations used by the algorithm can be at most:

$$\begin{aligned} \frac{1 - (0.5 + \delta )}{ \left( \frac{\delta }{\delta + 2}\right) ^2 \cdot 0.5}&\le \frac{1 - 0.5}{ \left( \frac{\delta }{\delta + 2}\right) ^2 \cdot 0.5} \\&= \frac{(\delta + 2)^2}{\delta ^2} = \frac{\delta ^2}{\delta ^2} + \frac{4\delta }{\delta ^2} + \frac{4}{\delta ^2}. \end{aligned}$$

Since \(\delta < 1\), we have that the algorithm terminates after at most \(O(\frac{1}{\delta ^2})\) iterations. \(\square \)

Lemma 13 implies that that after polynomially many iterations the algorithm finds a point such that \(f(\mathbf {x}) \le 0.5 +\delta \), and by definition such a point is a \((0.5 + \delta )\)-NE. Thus we have completed the proof of Theorem 7.

7 Application: Two-Player Bayesian Games

In this section, we define two-player Bayesian games, and show how our algorithm can be applied in order to efficiently find a \((0.5+\delta )\)-Bayesian Nash equilibrium. A two-player Bayesian game is played between a row player and a column player. Each player has a set of possible types, and at the start of the game, each player is assigned a type by drawing from a known joint probability distribution. Each player learns his type, but not the type of his opponent. Our task is to find an approximate Bayesian Nash equilibrium (BNE).

We show that this can be reduced to the problem of finding an \(\epsilon \)-NE in a polymatrix game, and therefore our algorithm can be used to efficiently find a \((0.5+\delta )\)-BNE of a two-player Bayesian game. This section is split into two parts. In the first part we formally define two-player Bayesian games, and approximate Bayesian Nash equilibria. In the second part, we give the reduction from two-player Bayesian games to polymatrix games.

7.1 Definitions

Payoff matrices. We will use \(k_1\) to denote the number of pure strategies of the row player and \(k_2\) to denote the number of pure strategies of the column player. Furthermore, we will use m to denote the number of types of the row player, and n to denote the number of types of the column player.

For each pair of types \(i \in [m]\) and \(j \in [n]\), there is a \(k_1 \times k_2\) bimatrix game \((R,C)_{ij} := (R_{ij}, C_{ij})\) that is played when the row player has type i and the column player has type j. We assume that all payoffs in every matrix \(R_{ij}\) and every matrix \(C_{ij}\) lie in the range [0, 1].

Types. The distribution over types is specified by a joint probability distribution: for each pair of types \(i \in [m]\) and \(j \in [n]\), the probability that the row player is assigned type i and the column player is assigned type j is given by \(p_{ij}\). Obviously, we have that:

$$\begin{aligned} \sum _{i=1}^m \sum _{j=1}^n p_{ij} = 1. \end{aligned}$$

We also define some useful shorthands: for all \(i \in [m]\) we denote by \(p^R_i\) (\(p^C_j\)) the probability that row (column) player has type \(i \in [m]\) (\(j \in [n]\)). Formally:

$$\begin{aligned} p^R_i = \sum _{j=1}^n p_{ij} \qquad \text {for all }i \in [m],\\ p^C_j = \sum _{i=1}^m p_{ij} \qquad \text {for all }j \in [n]. \end{aligned}$$

Note that \(\sum _{i=1}^m p^R_i = \sum _{j=1}^n p^C_j = 1\). Furthermore, we denote by \(p^R_i(j)\) the conditional probability that type \(j \in [n]\) will be chosen for column player given that type i is chosen for row player. Similarly, we define \(p^C_j(i)\) for the column player. Formally:

$$\begin{aligned} p^R_i(j) = \frac{p_{ij}}{p^R_i} \qquad \text {for all }i \in [m],\\ p^C_j(i) = \frac{p_{ij}}{p^C_j} \qquad \text {for all }j \in [n]. \end{aligned}$$

We can see that for given type \(t=(i,j)\) we have that \(p_{ij} = p^R_i \cdot p^R_i(j) = p^C_j \cdot p^C_j(i)\).

Strategies In order to play a Bayesian game, each player must specify a strategy for each of their types. Thus, a strategy profile is a pair \((\mathbf {x}, \mathbf {y})\), where \(\mathbf {x} = (x_1, x_2, \dots , x_{m})\) such that each \(x_i \in {\varDelta }_{k_1}\), and where \(\mathbf {y} = (y_1, y_2, \dots , y_{n})\) such that each \(y_i \in {\varDelta }_{k_2}\). This means that, when the row player gets type \(i \in [m]\) and the column player gets type \(j \in [n]\), then the game \((R_{ij}, C_{ij})\) will be played, and the row player will use strategy \(x_i\) while the column player will use strategy \(y_j\).

Given a strategy profile \((\mathbf {x}, \mathbf {y})\), we can define the expected payoff to both players (recall that the players are not told their opponent’s type).

Definition 14

(Expected payoff) Given a strategy profile \((\mathbf {x}, \mathbf {y})\) and a type \(t=(i,j)\), the expected payoff for the row player is given by:

$$\begin{aligned} u_R(x_i, \mathbf {y})&= \sum _{j=1}^n p^R_i(j) \cdot x^T_i R_{ij}y_j, \\&= x^T_i \sum _{j=1}^n p^R_i(j) \cdot R_{ij}y_j. \end{aligned}$$

Similarly, for the column player the expected payoff is:

$$\begin{aligned} u_C(\mathbf {x}, y_j)&= y^T_j \sum _{i=1}^m p^C_j(i) \cdot C_{ij}^T x_i. \end{aligned}$$

Rescaling. Before we define approximate equilibria for two-player Bayesian games, we first rescale the payoffs. Much like for polymatrix games, rescaling is needed to ensure that an \(\epsilon \)-approximate equilibrium has a consistent meaning. Our rescaling will ensure that, for every possible pair of types, both player’s expected payoff uses the entire range [0, 1].

For each type i of the row player, we use \({U} ^i_R\) to denote the maximum expected payoff for the row player when he has type i, and we use \({L} ^i_R\) to denote the minimum expected payoff for the row player when he has type i. Formally, these are defined to be:

$$\begin{aligned} {U} ^i_R&= \max _{a \in [k_1]} \sum _{j=1}^n \max _{b \in [k_2]} \left( p^R_i(j) \cdot R_{ij} \right) _{a, b}, \\ {L} ^i_R&= \min _{a \in [k_1]} \sum _{j=1}^n \min _{b \in [k_2]} \left( p^R_i(j) \cdot R_{ij} \right) _{a, b}. \end{aligned}$$

Then we apply the transformation \(T_R^i(\cdot )\) to every element z of \(R_{ij}\), for all types j of the column player, where:

$$\begin{aligned} T_R^i(z) := \frac{1}{{U} ^i_R-{L} ^i_R} \cdot \left( z - \frac{{L} ^i_R}{n} \right) . \end{aligned}$$
(15)

Similarly, we transform all payoff matrices for the column player using:

$$\begin{aligned} T_C^j(z) := \frac{1}{{U} ^j_C-{L} ^j_C} \cdot \left( z - \frac{{L} ^j_C}{m} \right) , \end{aligned}$$
(16)

where \({U} ^j_C\) and \({L} ^j_C\) are defined symmetrically. Note that, after this transformation has been applied, both player’s expected payoffs lie in the range [0, 1]. Moreover, the full range is used: there exists a strategy for the column player against which one of the row player’s strategies has expected payoff 1, and there exists a strategy for the column player against which one of the row player’s strategies has expected payoff 0. From now on we will assume that the payoff matrices have been rescaled in this way.

We can now define approximate Bayesian Nash equilibria for a two-player Bayesian game.

Definition 15

(Approximate Bayes Nash Equilibrium (\(\epsilon \) -BNE)) Let \((\mathbf {x}, \mathbf {y})\) be a strategy profile. The profile \((\mathbf {x}, \mathbf {y})\) is an \(\epsilon \)-BNE iff the following conditions hold:

$$\begin{aligned} u_R(x_i, \mathbf {y})&\ge u_R(x'_i, \mathbf {y}) - \epsilon \quad \text {for all }x'_i \in {\varDelta }_{k_1} \quad \text {for all }i \in [m], \end{aligned}$$
(17)
$$\begin{aligned} u_C(\mathbf {x}, y_j)&\ge u_C(\mathbf {x}, y'_j) - \epsilon \quad \text {for all }y'_j \in {\varDelta }_{k_2} \quad \text {for all }j \in [n]. \end{aligned}$$
(18)

7.2 The Reduction

In this section we reduce in polynomial time the problem of computing an \(\epsilon \)-BNE for a two-player Bayesian game \(\mathcal {B}\) to the problem of computing an \(\epsilon \)-NE of a polymatrix game \(\mathcal {P(B)}\). We describe the construction of \(\mathcal {P(B)}\) and prove that every \(\epsilon \)-NE for \(\mathcal {P(B)}\) maps to an \(\epsilon \)-BNE of \(\mathcal {B}\).

Construction. Let \(\mathcal {B}\) be a two-player Bayesian game where the row player has m types and \(k_1\) pure strategies and the column player has n types and \(k_2\) pure strategies. We will construct a polymatrix game \(\mathcal {P(B)}\) as follows.

The game has \(m + n\) players. We partition the set of players \([m + n]\) into two sets: the set \(K = \{1, 2, \dots , m\}\) will represent the types of the row player in \(\mathcal {B}\), while the set \(L = \{m+1, m+2, \dots , m+n\}\) will represent the types of the column player in \(\mathcal {B}\). The underlying graph that shows the interactions between the players is a complete bipartite graph \(G = (K \cup L, E)\), where every player in K (respectively L) plays a bimatrix game with every player in L (respectively K). The bimatrix game played between vertices \(v_i \in K\) and \(v_j \in L\) is defined to be \((R^*_{ij}, C^*_{ij})\), where:

$$\begin{aligned} R^*_{ij}&:= p^R_i(j) \cdot R_{ij}, \end{aligned}$$
(19)
$$\begin{aligned} C^*_{ij}&:= p^C_j(i) \cdot C_{ij}. \end{aligned}$$
(20)

for all \(i \in [m]\) and \(j \in [n]\).

Observe that, for each player i in the K, the matrices \(R^*_{ij}\) all have the same number of rows, and for each player \(j \in L\), the matrices \(C^*_{ij}\) all have the same number of columns. Thus, \(\mathcal {P(B)}\) is a valid polymatrix game. Moreover, we clearly have that \(\mathcal {P(B)}\) has the same size as the original game \(\mathcal {B}\). Note that, since we have assumed that the Bayesian game has been rescaled, we have that for every player in \(\mathcal {P(B)}\) the minimum (maximum) payoff achievable under pure strategy profiles is 0 (1), so no further scaling is needed in order to apply our algorithm.

We can now prove that every \(\epsilon \)-NE of the polymatrix game is also an \(\epsilon \)-BNE of the original two-player Bayesian game, which is the main result of this section.

Theorem 16

Every \(\epsilon \)-NE of \(\mathcal {P(B)}\) is a \(\epsilon \)-BNE for \(\mathcal {B}\).

Proof

Let \(\mathbf {z} = (x_1, \ldots , x_m, y_1, \ldots , y_n)\) be an \(\epsilon \)-NE for \(\mathcal {P(B)}\). This means that no player can gain more than \(\epsilon \) by unilaterally changing his strategy. We define the strategy profile \((\mathbf {x}, \mathbf {y})\) for \(\mathcal {B}\) where \(\mathbf {x} = (x_1, \dots , x_m)\) and \(\mathbf {y} = (y_1, \dots , y_n)\), and we will show that \((\mathbf {x}, \mathbf {y})\) is an \(\epsilon \)-BNE for \(\mathcal {B}\).

Let \(i \in K\) be a player. Since, \(\mathbf {z} \) is an \(\epsilon \)-NE of \(\mathcal {P(B)}\), we have:

$$\begin{aligned} u_i(x_i, \mathbf {z}) \ge u_i(x'_i, \mathbf {z}) - \epsilon \quad \text {for all }x'_i \in {\varDelta }_{k_1}. \end{aligned}$$

By construction, we can see that player i only interacts with the players from L. Hence his payoff can be written as:

$$\begin{aligned} u_i(x_i, \mathbf {z}) = x^T_i \sum _{j=1}^n R^*_{ij}y_j = u^R(x_i, \mathbf {y}). \end{aligned}$$

and since we are in an \(\epsilon \)-NE, we have:

$$\begin{aligned} u^R(x_i, \mathbf {y}) \ge u^R(x'_i, \mathbf {y}) - \epsilon \quad \text {for all }x'_i \in {\varDelta }_{k_1}. \end{aligned}$$
(21)

This is true for all \(i \in K\), thus it is true for all \(i \in [m]\).

Similarly, every player \(j \in L\) interacts only with players form K, thus:

$$\begin{aligned} u^C(\mathbf {x}, y_j) = y^T_j \sum _{i=1}^m (C^*_{ij})^Tx_i. \end{aligned}$$

Since we are in an \(\epsilon \)-NE we have:

$$\begin{aligned} u^C(\mathbf {x}, y_j) \ge u_C(\mathbf {x}, y'_j) - \epsilon \quad \text {for all }y'_j \in {\varDelta }_{k_2}, \end{aligned}$$
(22)

and this is true for all \(j \in K\), thus it is true for all \(j \in [n]\).

Combining now the fact that Eq. (21) is true for all \(i \in [n]\) and that Eq. (22) is true for all \(j \in [m]\), it is easy to see that the strategy profile \((\mathbf {x}, \mathbf {y})\) is an \(\epsilon \)-BNE for \(\mathcal {B}\). \(\square \)

Applying Algorithm 1 to \(\mathcal {P(B)}\) thus gives us the following.

Theorem 17

A \((0.5+\delta )\)-Bayesian Nash equilibrium of a two-player Bayesian game \(\mathcal {B}\) can be found in time polynomial in the input size of \(\mathcal {B}\) and \(1/\delta \).

8 Conclusions and Open Questions

We have presented a polynomial-time algorithm that finds a \((0.5+\delta )\)-Nash equilibrium of a polymatrix game for any \(\delta >0\). Recently it was shown [18] that the performance guarantee that Tsaknakis and Spirakis proved for their algorithm [28] is almost tight. Though we do not have examples that show that the approximation guarantee is tight for our algorithm, we do not see an obvious approach to prove a better guarantee. The initial choice of strategy profile affects our algorithm, and it is conceivable that one may be able to start the algorithm from an efficiently computable profile with certain properties that allow a better approximation guarantee. One natural special case is when there is a constant number of players, which may allow one to derive new strategy profiles from a stationary point as done by Tsaknakis and Sprirakis [28]. It may also be possible to develop new techniques when the number of pure strategies available to the players is constant, or when the structure of the graph is restricted in some way. For example, in the games arising from two-player Bayesian games, the graph is always bipartite.

This paper has considered \(\epsilon \)-Nash equilibria, which are the most well-studied type of approximate equilibria. However, \(\epsilon \)-Nash equilibria have a drawback: since they only require that the expected payoff is within \(\epsilon \) of a pure best response, it is possible that a player could be required to place probability on a strategy that is arbitrarily far from being a best response. An alternative, stronger, notion is an \(\epsilon \) -well supported approximate Nash equilibrium (\(\epsilon \)-WSNE). It requires that players only place probability on strategies that have payoff within \(\epsilon \) of a pure best response. Every \(\epsilon \)-WSNE is an \(\epsilon \)-Nash, but the converse is not true. For bimatrix games, the best-known additive approximation that is achievable in polynomial time gives a \(\bigl (\frac{2}{3}-0.0047\bigr )\)-WSNE [17]. It builds on the algorithm given by Kontogiannis and Spirakis that achieves a \(\frac{2}{3}\)-WSNE in polynomial time [25]. Recently a polynomial-time algorithm with a better approximation guarantee have been given for symmetric bimatrix games [9]. Note, it has been shown that there is a PTAS for finding \(\epsilon \)-WSNE of bimatrix games if and only if there is a PTAS for \(\epsilon \)-Nash [6, 11]. For n-player games with \(n>2\) there has been very little work on developing algorithms for finding \(\epsilon \)-WSNE. This is a very interesting direction, both in general and when \(n>2\) is a constant.