1 Introduction

Stochastic learning models provide a rich framework for studying the evolution of human behavior in socioeconomic contexts and hence offer “evolutionary” foundations for the selection of Nash equilibrium. In most of the studies in this field, agents are assumed to be homogeneous and boundedly rational. With this assumption, these studies mainly focus on manipulating the stage game, agents’ interaction structure, the information agents can observe, and the strategy-updating rule, in order to see their effects on behavioral evolution in the long run. Although, in reality, agents are very likely to be heterogeneous (e.g., people may use different behavioral rules, or have different rationality levels), this situation is not sufficiently explored in the literature on stochastic learning. The purpose of this paper is to model the heterogeneity of players’ rationality levels and to show its crucial effect on strategic interactions in the long run, within the specific framework called “location models.”

Our model consists of two stages and two locations. In stage 1, the rational manager in each location chooses the maximum capacity to optimize a certain objective function, which is related to the payoffs of the agents in his location. After that, there is a learning dynamics in stage 2. In every period, the boundedly rational agents in each location repeatedly play a \(2 \times 2\) coordination game, in which one strategy is more efficient, while the other is safer. Across periods, the agents adjust their choices using a certain behavioral rule, with the opportunity to relocate.

The principal feature of this model is that it involves players with different rationality levels. The rational managers play a one-shot game first, followed by the stochastic learning dynamics of the boundedly rational agents. Although players with different rationality levels do not interact directly, the decisions of the managers in stage 1 will have a non-trivial effect on the strategic choices in the coordination game in the long run. Within this framework, we would like to answer the following questions: What are the capacity choices of the managers? What are the choices of the agents in different locations in the long run?

We present two examples which are related to the model outlined briefly above. Consider a case of two investors, and each of them plans to build an industrial park for firms that specialize in software development. To do this, each investor has to choose the maximum capacity of his park. Suppose that, for each firm, there are two different development platforms available: One is more efficient, and the other is more compatible. Suppose also that firms are so small that it is beneficial for each of them to use only one platform once and to have joint development programs with the others in the same industrial park. Coordination issues will then arise when two firms are engaged in a joint program: The payoffs for both firms to use the same platform are higher than when they use different ones. To obtain higher payoffs, firms can change their platforms and locations across periods. Assume that the profit of the investor of each industrial park is related to the payoffs of the firms within his park in a certain way (e.g., each investor charges for a fixed percentage of the payoffs of the firms located in his park). Hence, the investors care about the coordination among the firms and would like to affect the firms’ choices on platforms within their respective parks by choosing the capacity constraint in the very beginning.Footnote 1

A second example is the interaction between firms’ enrollment policies and employees’ choices on unobservable effort levels. Suppose that two firms are enrolling employees in the labor market, and both of them have to decide how many employees to hire first. The employees can choose two different effort levels (high and low), which are unobservable. Both firms use the same joint-production technology as described in Khan (2014). That is, the employees have to work in pairs, and the output level of a pair is determined by the minimum effort. The wage of each employee hinges on the output level. Adjusting employees’ wages by their effort levels, this situation can be captured by a coordination game. Coordinating on high effort is preferred to coordinating on low effort. In case of mis-coordination, the payoff of the employee with low effort is higher than the one with high effort. To get higher payoffs, employees can change their effort levels or firms. Assume that the profit of each firm depends on how employees coordinate with their colleagues. Since the managers cannot observe the employees’ effort levels, they cannot just require them to work with high effort. Hence, to maximize the profit, the managers would like to use capacity choices to affect the coordination among the employees.

In either case, we are interested in investigating (i) how the long-run consequences of the learning dynamics will influence the rational players’ optimal choices, considering that they have access to all the details of the learning dynamics, and, in turn, (ii) how the rational players’ one-shot choices finally shape the behavioral pattern of the boundedly rational players in the long run. Doing this exercise, we study the evolution of social behavior in a more realistic situation, where players with heterogenous rationality levels interact. We illustrate, within our framework, that the interaction among heterogenous players indeed has a crucial effect on both the strategic choices of the rational players and that of the boundedly rational ones.

We first consider the case where both managers care about the average of the agents’ payoffs in their respective locations. We identify two classes of Nash equilibria (NE) in the managers’ game. In one class, both managers choose relatively small capacities, which leads to global coordination on the risk-dominant equilibrium in the long run. In the other class, one location is relatively large and the other is relatively small, which leads the agents in the small location to coordinate on the Pareto-efficient equilibrium and those in the large one to coordinate on the risk-dominant equilibrium (coexistence of conventions).

We then extend the basic model to the following three directions. First, we consider the case where the managers are concerned with the total payoffs of the agents in their respective locations. We find that, in this case, for \(N\) large enough, there is only one possible candidate of NE. Second, we allow the managers to also restrict the mobility of the agents in their locations. We show that, if the managers are only concerned with efficiency, the result is similar to the case where the managers can only choose capacity constraints. However, if they care about scale, they would like to restrict the mobility to forbid migration completely, which leads agents in both locations to coordinate on the risk-dominant equilibrium. Third, we change the basic interaction to a pure coordination game with more than two strategies. We show that, when mis-coordination leads to zero payoffs, all the agents will coordinate on the most efficient equilibrium in the long run. Focusing on the situation that the managers are concerned with scale, we find that, if they can choose capacity constraints only, each of them will set a capacity large enough to accommodate all the agents; if they can also choose mobility constraints, no agents will be allowed to move out.

There are a series of studies in the literature on stochastic learning, which are related to the work in this paper. In our model, the basic interactions in the learning dynamics are coordination games. Hence, our model is related to the studies which apply evolutionary models to select equilibrium in coordination games (see, for example, Kandori et al. 1993; Ellison 1993; Robson and Vega-Redondo 1996; Alós-Ferrer and Weidenholzer 2007, 2008; Alós-Ferrer 2008; Alós-Ferrer and Netzer 2014). Our model is different from those studies in that we introduce strategic interaction between rational managers, which will have an important impact on the equilibrium selection.

Our model involves two different locations, and boundedly rational agents are allowed to change their locations. Hence, it encompasses the multiple location models of Ely (2002) and Anwar (2002).Footnote 2 In Ely (2002), it is assumed that each location is large enough to accommodate all the agents, and there is no mobility constraint. He shows that, in this case, all the agents will agglomerate in one location and coordinate on the Pareto-efficient equilibrium. Anwar (2002) considers the case where both locations have the same restrictive capacity and mobility constraints. He finds that, if the effective capacity of both locations is small, the risk-dominant equilibrium will prevail globally; if the effective capacity of both locations is large, agents in different locations will coordinate on different equilibria. Instead of taking capacity and mobility constraints as exogenous parameters, we allow the managers of both locations to choose these constraints. We find that the optimal choices of the managers do not include the policy arrangements in Ely (2002) and Anwar (2002). This paper is also related to Ania and Wagener (2009). Both of the models share the common feature that there are multiple locations, and each of them has a manager. Further, agents will react to the policies and choose locations if possible. However, the major difference is that, in their study, the managers are not rational. Instead of making a one-shot decision, they use an imitate-the-best rule to change their policies every period.

In this paper, players with different rationality levels interact, and hence, it is related to several studies involving heterogeneous players. Alós-Ferrer et al. (2010) develop a two-stage model where rational platform designers attempt to maximize the profits of their respective platforms, and the boundedly rational traders choose among the platforms by imitating the most successful behavior in the previous period. Similar to our model, rational players make a one-shot decision first, and boundedly rational players are involved in a learning dynamics afterward. Unlike our model, though, the basic interaction in the learning dynamics is Cournot competition rather than a coordination game. Juang (2002) considers the case where players use different behavioral rules to update their strategies in an evolutionary model of coordination games. Geber and Bettzüge (2007) investigate the evolutionary choice of asset markets when boundedly rational traders have idiosyncratic preferences for the markets. Alós-Ferrer and Shi (2012) explore the effect of asymmetric memory length in the population on the long-run consequences of stochastic learning. Although we all have heterogenous players, our model is different from these three studies in that different types of players in our setup have to play sequentially.

Remainder of the paper is organized as follows. Section 2 develops a simple two-stage model. Using this model, we analyze the long-run outcome of the learning dynamics and investigate the strategic capacity choices of the managers. Section 3 discusses three extensions. In the first one, we suppose managers are concerned with the scales of their respective locations. In the second one, we allow managers to also choose mobility constraints. In the last one, we consider a general \(n\times n\) pure coordination game with more than two strategies. Section 4 briefly concludes. All the proofs are relegated to the Appendix.

2 The two-stage model

In this section, we develop a simple two-stage model, where rational managers choose capacity constraints and boundedly rational agents are involved in a learning dynamics of coordination games. Then, we use this model to analyze the long-run behavior of the agents and the strategic choices of the managers.

2.1 The setting

2.1.1 Locations

A total of \(2N\) agents are distributed in two different locations \(k \in \{1,2\}\), initially with \(N\) agents in each location. That is, the two locations are identical in the very beginning. Each location \(k\) has a rational manager, referred to as manager \(k\).

2.1.2 Managers

The model involves two stages. Stage 1 consists of a static game between the two rational managers. The managers can neither relocate nor interact with the agents. Instead, to optimize a certain objective function, manager \(k\) can choose a capacity constraint \(c_k \in [1,2]\) such that \(\lfloor c_k N \rfloor \) determines the maximum capacity of location \(k\).Footnote 3 Since the two locations compose the whole world in this model, we assume that each location can accommodate at least \(N\) agents (\(c_k \ge 1\) for both \(k=1,2\)) to avoid the situation where there is no place for some agents to stay (i.e., \(c_1+c_2<2\)). Hence, the capacity of location \(k\) can be represented by \(M_k \equiv \lfloor c_kN \rfloor \), and the minimum number of players in location \(k\) is \(m_k \equiv 2N-M_\ell \), for \(k,\ell \in \{1,2\}, k \ne \ell \).

2.1.3 Learning dynamics of agents

Stage 2 of the model consists of a discrete-time learning dynamics, given the capacity constraints determined by the managers in both locations. It is related to Anwar (2002) and includes Ely (2002) as a particular case.

Basic coordination game Time is discrete, i.e., \(t=1,2,\ldots \). In each period \(t\), agents within each location play a pure coordination game (Table 1) with each other (i.e., round-robin tournament), where \(e,r,g,h >0\), \(e>r>h\), \(r>g\), and \(h+r>e+g\).Footnote 4 Hence, \((P,P)\) is the Pareto-efficient equilibrium and \((R,R)\) is the risk-dominant equilibrium. We focus on pure coordination games to capture the fact that if two agents use different technologies, the incompatibility leads their respective payoffs to be lower than what they could obtain by both using the less efficient technology.Footnote 5 Let \(\varPi :S \times S \rightarrow \mathbb {R}^+\) be the payoff function of the game, where \(S=\{P,R\}\) is the strategy set of each player. \(\varPi (s_i,s_j)\) denotes the payoff of playing \(s_i\) against \(s_j\). We denote by \(q^*\) the probability of playing \(P\) in the mixed-strategy Nash equilibrium, i.e., \(q^*=(r-g)/(e-g-h+r)\). Since \((R,R)\) is risk-dominant, \(q^*\) is strictly larger than \(1/2\). To facilitate the discussion later, we also define \(\hat{q}\) through the equality \(\varPi (P,(\hat{q},1-\hat{q}))=r\), which yields \(\hat{q}=(r-g)/(e-g)\). That is, \(\hat{q}\) is the probability of playing \(P\) in a mixed-strategy such that playing \(P\) against it gives the same payoff as in the risk-dominant equilibrium. Since every agent repeatedly plays the coordination game with all the others in one period, we assume that each agent’s payoff in one period is the average payoff of all his interactions.

Table 1 The basic coordination game

Behavioral rule In every period, agents can adjust their strategies in the coordination game. Further, they can relocate with an independent and identical positive probability. We assume that agents update their behavior based on myopic best reply to the state. That is, each agent will choose a strategy and a location (if possible) that is a best reply to the strategy profile in the last period.

Formally, let \(q^t_k\) be the proportion of \(P\)-players in location \(k \in \{1,2\}\) in period \(t\). We assume that, in each period \(t+1\), every player observes \(q^t_k\) for both \(k=1,2\), and anticipates the payoff of playing \(s \in \{P,R\}\) in location \(k\), \(\bar{\varPi }^{t+1} (s, k)\), based on the strategy distribution of players in location \(k\) in period \(t\); that is,

$$\begin{aligned} \bar{\varPi }^{t+1} (s, k) \equiv \varPi (s,(q^t_k,1-q^t_k))= q^t_k\varPi (s,P)+(1-q^t_k) \varPi (s,R). \end{aligned}$$
(1)

Then, each player will choose a strategy and a location (when such an opportunity arises) that maximizes \(\bar{\varPi }^{t+1} (s, k)\), given the capacity constraints. Specifically, an immobile agent will choose a strategy \(s^{t+1}_i\in \{P,R\}\) in period \(t+1\) given his current location, such that

$$\begin{aligned} s^{t+1}_i \in \arg \max _{s'_i} \bar{\varPi }^{t+1} (s'_i, k^t_i), \end{aligned}$$
(2)

where \(k^t_i\) denotes agent \(i\)’s location in period \(t\). A mobile agent will choose a strategy \(s^{t+1}_i\) and a location \(k^{t+1}_i\), such that

$$\begin{aligned} (s^{t+1}_i,k^{t+1}_i) \in \arg \max _{(s'_i,k'_i)} \bar{\varPi }^{t+1} (s'_i, k'_i). \end{aligned}$$
(3)

If several choices give a player the maximum payoff, she will play each of them with a positive probability. Relocation is restricted by the capacity constraints of the two locations. If the number of agents who attempt to move to location \(k\) is more than the number of vacancies, agents will be randomly chosen from the set of candidates to fill all the vacancies. Those who are not allowed to move out will behave as immobile agents and play the myopic best reply to the state given their current locations.

Remark 1

When computing the best response to the state in the other location, this approach is equivalent to the standard myopic best reply; that is, a player will choose a strategy to maximize his payoff, given the strategies played by all the remaining players in the last period. However, when computing the best response to the state in the current location, this approach differs from the standard myopic best reply, in that the player counts himself in the last period as an opponent. For a large population, this corresponds to Oechssler’s (1997) best response for a large population. That is, if the population is large enough, it is innocuous for an agent to include himself in the strategy profile of the opponents when he computes his payoff, since the strategy of one player almost has no effect on the strategy distribution of a large population. For a small population, though, this kind of behavior can differ from the standard myopic best reply, as the agent ignores the fact that changing his strategy also changes the population proportions at his location. Particularly, when there is only one player in a location, he will behave as if he could play with himself (i.e., self-matching).Footnote 6

2.2 The long-run behavior of agents

This subsection analyzes the learning dynamics in stage 2. We first identify the absorbing sets of the unperturbed learning dynamics. From these candidates, we single out the long-run equilibria (LRE) given the capacity constraints in stage 1.

2.2.1 The absorbing sets

Let \(\omega =(v_1,v_2,n_1)\) represent a state of the dynamics described above, where \(v_k\) denotes the number of \(P\)-players in location \(k\), and \(n_k\) denotes the total number of players in location \(k\) for \(k=1,2\). The state space is given by

$$\begin{aligned} \varOmega \!=\! \{(v_1,v_2,n_1)| v_1 \in \{0,1,\ldots ,n_1\}, v_2 \in \{0,1,\ldots ,2N-n_1\}, n_1 \in \{m_1,\ldots , M_1 \}\}. \end{aligned}$$

Denote

$$\begin{aligned} q_1(\omega )=\left\{ \begin{array} {cc} \frac{v_1}{n_1}, &{} \quad \text{ if } \ n_1 > 0\\ 0, &{} \quad \text{ otherwise } \end{array} \right. \ \text{ and } \ \ q_2(\omega )=\left\{ \begin{array} {ll} \frac{v_2}{2N-n_1}, &{} \quad \text{ if } \ n_1<2N \\ 0, &{}\quad \text{ otherwise } \end{array} \right. \end{aligned}$$
(4)

We can equivalently rewrite a state as \(\omega =(q_1,q_2,n_1)\). The stochastic dynamics above gives rise to a Markov process, whose transition matrix is given by \(P=[P(\omega , \omega ')]_{\omega ,\omega ' \in \varOmega }\). An absorbing set is a minimal subset of states such that, once the process reaches it, it never leaves.

Lemma 1

The absorbing sets of the unperturbed process above depend on \(c_k\) for \(k=1,2\) as follows.

  1. (a)

    If \(c_k<2\) for both \(k=1,2\), there are four absorbing sets: \(\varOmega (PR)\), \(\varOmega (RP)\), \(\varOmega (RR)\) and \(\varOmega (PP)\).

  2. (b)

    If \(c_1=2\) and \(c_2 < 2\), there are three absorbing sets: \(\varOmega (RO)\), \(\varOmega (PO)\), and \(\varOmega (RP)\). Similarly, if \(c_2=2\) and \(c_1< 2\), the absorbing sets are: \(\varOmega (OR)\), \(\varOmega (OP)\) and \(\varOmega (PR)\).

  3. (c)

    If \(c_1=c_2=2\), there are four absorbing sets: \(\varOmega (RO)\), \(\varOmega (OR)\), \(\varOmega (PO)\) and \(\varOmega (OP)\).

where \(\varOmega (PR)=\{(1, 0, M_1)\}\), \(\varOmega (RP)=\{(0, 1, m_1)\}\), \(\varOmega (RR)=\{(0,0,n_1)|\) \(n_1 \in \{m_1,\ldots ,M_1\} \}\), \(\varOmega (PP)=\{(1,1,n_1)|n_1 \in \{m_1, \ldots , M_1\} \}\), \(\varOmega (RO)=\{(0,0,2N)\}\), \(\varOmega (PO)=\{(1,0,2N)\}\), \(\varOmega (OR)=\{(0,0,0)\}\), and \(\varOmega (OP)=\{(0,1,0)\}\).

2.2.2 The equilibrium concept

Following the standard approach in the literature on stochastic learning in games, we introduce mutations in this Markov dynamics. We assume that, in each period, with an independent and identical probability \(\varepsilon \), each agent randomly chooses a strategy and a location if possible. The perturbed Markov dynamics is ergodic, i.e., there is a unique invariant distribution \(\mu (\varepsilon )\). We want to consider small perturbations. It is a well-established result that \(\mu ^{*}=\lim _{\varepsilon \rightarrow 0}\mu (\varepsilon )\) exists and is an invariant distribution of the unperturbed process \(P\). It describes the time average spent in each state when the original dynamics is slightly perturbed and time goes to infinity. The states in its support, \(\{\omega | \mu ^{*}(\omega )>0\}\), are called stochastically stable states or long-run equilibria.

For small population size, the analysis of LRE in our model runs into integer problems, which arise because both the number of agents in each location and the number of mutants involved in any considered transition need to be integers. For this reason, the boundaries of the sets in the \((c_1,c_2)\)-space in which different LRE are selected are highly irregular. In the limit, these boundaries become clear-cut, but, for a fixed population size, there is always a small area where long-run outcomes are not clear. However, for large population size, these boundary areas effectively vanish. In order to tackle this difficulty, we introduce the following definition.

Definition 1

\(\tilde{\varOmega }\) is the LRE set for large \(N\) uniformly for the set of conditions \(\{J_z (c_1,c_2)\) \(>0\}^Z_{z=1}\) if, for any \(\eta >0\), there exists an integer \(N_\eta \) such that for all \(N>N_\eta \), \(\tilde{\varOmega }\) is the set of LRE for all \((c_1,c_2) \in [1,2]^2\) such that \(\{J_z (c_1,c_2)>\eta \}^Z_{z=1}\).

In words, an LRE set for large population size is such that, given “ideal” boundaries described by certain inequalities (which will be specified in each of our result below), for every point arbitrarily close to the boundaries, there exists a minimal population size such that the LRE corresponds to the given set. Further, the definition incorporates a uniform convergence requirement, in the sense that the minimal population size depends only on the distance to the ideal boundaries and not on the individual point. This uniform convergence property is important in order for us to be able to analyze the managers’ game in stage 1.

2.2.3 The LRE in stage 2

Following the standard approach developed by Freidlin and Wentzell (1988), we construct the minimum-cost tree for each absorbing set given in Lemma 1 and then use them to identify the LRE. We summarize the results in Theorem 1 and illustrate them in Fig. 1. Denote \(\bar{q} \equiv 1/q^*-2+2q^*\) and let

$$\begin{aligned} \varPhi (c_k)=\left\{ \begin{array}{ll} 2-\frac{1-q^*}{q^*}c_k, &{} \ \text{ if } \ \hat{q} \le \bar{q}\\ 2-\frac{1-\hat{q}}{2q^*-1}c_k, &{} \ \text{ if } \ \hat{q} > \bar{q}. \end{array} \right. \end{aligned}$$

Theorem 1

The LRE of the learning dynamics in stage 2 depends on \(c_1\) and \(c_2\).

  1. (a)

    \(\varOmega (RR)\) is the LRE set for large \(N\) uniformly for \(c_1 < \varPhi (c_2)\) and \(c_2 < \varPhi (c_1)\);

  2. (b)

    \(\varOmega (RP)\) is the LRE set for large \(N\) uniformly for \(c_1 > \varPhi (c_2)\) and \(c_1 > c_2\); and

  3. (c)

    \(\varOmega (PR)\) is the LRE set for large \(N\) uniformly for \(c_2 > \varPhi (c_1)\) and \(c_1 < c_2\).

Fig. 1
figure 1

An illustration of the LRE set in the main area and the subregions of the vanishing area

Theorem 1 says that, for \(N\) large enough, in the main area of the \((c_1,c_2)\)-space, only the elements in these three absorbing sets (\(\varOmega (RR)\), \(\varOmega (RP)\) and \(\varOmega (PR)\)) can be possibly selected as stochastically stable. If the maximum capacities of both locations are relatively small, the risk-dominant equilibrium will prevail globally (statement (a)). To see the intuition, one can consider the transitions between \(\varOmega (PR)\) and \(\varOmega (RR)\), which play a key role in the selection of \(\varOmega (RR)\). To complete the transition from \(\varOmega (PR)\) to \(\varOmega (RR)\), it is necessary to change the convention from \((P,P)\) to \((R,R)\) in location 1. The transition in the opposite direction requires the reverse change. The absorbing set \(\varOmega (RR)\) will be selected if the number of the mutants required for the former transition is smaller than that for the latter one. For it to hold, the maximum capacity in location 1 (determined by \(c_1\)) has to be small enough, and the minimum number of agents in location 1 (determined by \(2-c_2\)) has to be large enough. Hence, both \(c_1\) and \(c_2\) are required to be small enough for the selection of \(\varOmega (RR)\).

However, if the capacity of one location is larger than that of the other, the agents in the larger location will coordinate on the risk-dominant equilibrium, while those in the smaller location on the Pareto-efficient one (statements (b) and (c), respectively). The intuition can be understood by considering the transitions between \(\varOmega (PR)\) and \(\varOmega (RP)\), which play a decisive role in the selection of coexistence of conventions. The minimum-cost transition in either direction will be achieved indirectly through \(\varOmega (PP)\). That is, the convention of one location will be changed from \((R,R)\) to \((P,P)\), then, that of the other location will be changed from \((P,P)\) to \((R,R)\). Since the population share of the mutants required for the first-step transition is \(q^*\), larger than that for the second-step transition \(1-q^*\), the former transition has a dominant effect. Hence, we can focus only on the transition in the first step. In the transition from \(\varOmega (PR)\) to \(\varOmega (PP)\), the number of mutants is determined by the minimum number of agents in location 2 (determined by \(2-c_1\)). Similarly, for the transition in the opposite direction, the number of mutants is determined by the minimum number of agents in location 1 (determined by \(2-c_2\)). To select \(\varOmega (RP)\) in the long run, the number of mutants required for the transition from \(\varOmega (PR)\) to \(\varOmega (PP)\) has to be less than that for the transition in the reverse direction, which holds for \(c_1>c_2\).

Further, the parameter regions of the different LRE depend on the values of \(\hat{q}\) and \(\bar{q}\). If \(\hat{q} \le \bar{q}\), the payoff for the risk-dominant equilibrium is relatively low. In this case, \(\varOmega (RR)\) will be selected in a relatively small parameter region. By contrast, if \(\hat{q} > \bar{q}\), the payoff of the risk-dominant equilibrium is relatively high. In this case, \(\varOmega (RR)\) will be selected in a larger parameter region. This is reflected by the fact that, for \(\hat{q}>\bar{q}\), \(2-\frac{1-\hat{q}}{2q^*-1}c_k > 2-\frac{1-q^*}{q^*}c_k\) for \(c_k \in [1,2]\), \(k=1,2\). The reason is that, when the payoff of the risk-dominant equilibrium is larger, the basin of attraction of \((R,R)\) is larger, and the transition toward \(\varOmega (RR)\) requires less mutants.

The LRE in the vanishing areas are not clear-cut; multiple LRE may coexist. This result is summarized in Proposition 1. To facilitate the discussion, we define several subareas in the \((c_1,c_2)\)-space and illustrate them also in Fig. 1. For any \(\eta >0\), denote the area on the border of the three main regions by \(V(\eta ) = [1,2]^2 \setminus (\{\varPhi (c_2)-c_1 > \eta \ \text{ and } \ \varPhi (c_1)-c_2 >\eta \}\cup \{c_1 -\varPhi (c_2)>\eta \ \text{ and } \ c_1 - c_2>\eta \} \cup \{c_2 -\varPhi (d_1)>\eta \ \text{ and } \ c_2 - c_1>\eta \})\), and the subareas of \(V(\eta )\), respectively, by

$$\begin{aligned}&V_{a}(\eta ) = V(\eta ) \cap \{(c_1,c_2)| c_1 -c_2 > \eta , \varPhi (c_1)- c_2 > \eta \};\\&V_{b}(\eta ) = V(\eta ) \cap \{(c_1,c_2)| c_2 - c_1> \eta , \varPhi (c_2)-c_1 > \eta \};\\&V_{c}(\eta )= \{(c_1,c_2)| 2- c_1 \ge \eta , 2-c_2 \ge \eta \};\\&V_{d}(\eta ) =V(\eta ) \cap \{(c_1,c_2)| c_1 -\varPhi (c_2) > \eta , c_2-\varPhi (c_1) > \eta \}\!\setminus \!V_c(\eta );\\&V_e(\eta ) =V(\eta )\!\setminus \!(V_{a}(\eta ) \cup V_{b}(\eta ) \cup V_{c} \cup V_{d}(\eta )). \end{aligned}$$

Note that \(V(\eta )\) is vanishing as \(\eta \) decreases: \(V(\eta )\) will shrink to the lower-dimensional set \(\{(c_1,c_2)|c_1=\varPhi (c_2) \ \text{ for } \ c_2 \in [1,2q^*]\} \cup \{(c_1,c_2)|c_2=\varPhi (c_1) \ \text{ for } \ c_1 \in [1,2q^*]\} \cup \{(c_1,c_2)|c_1=c_2 \ \text{ for } \ c_2 \in [2q^*,2]\}\) as \(\eta \rightarrow 0\).

Proposition 1

For any \(\eta >0\), there exists an integer \(\bar{N}\), such that for all \(N > \bar{N}\), for \((c_1,c_2) \in V(\eta )\), the LRE form a subset of

  1. (a)

    \(\varOmega (RP) \cup \varOmega (RR)\) if \((c_1,c_2) \in V_{a}(\eta )\);

  2. (b)

    \(\varOmega (PR) \cup \varOmega (RR)\) if \((c_1,c_2) \in V_{b}(\eta )\);

  3. (c)

    \(\varOmega (PR)\cup \varOmega (RP)\cup \varOmega (OP) \cup \varOmega (PO) \cup \varOmega (PP)\) if \((c_1,c_2) \in V_c(\eta )\!\setminus \!\{(2, 2)\}\);

  4. (d)

    \(\varOmega (PR)\cup \varOmega (RP)\) if \((c_1,c_2) \in V_{d}(\eta )\); and

  5. (e)

    \(\varOmega (PR)\cup \varOmega (RP) \cup \varOmega (RR)\) if \((c_1,c_2) \in V_e(\eta )\).

  6. (f)

    Further, the LRE are the elements in \(\varOmega (PP)\) for \(c_1=c_2=2\).

2.3 The equilibrium policies

This section analyzes the optimal choices of the two managers on the capacity constraints. The managers in both locations are assumed to be rational. They have perfect information about the learning dynamics and can accurately anticipate the long-run consequences of the capacity choices in both locations. Hence, given this knowledge, each manager makes a one-shot decision on the capacity constraint \(c_k \in [1,2]\) of his own location to achieve the particular objective that he is pursuing.

In this section, we consider the case where the objective function of each manager is associated with the average of the payoffs of the agents in his location in the long run (we explore the case where the managers care about scale in Sect. 3). It captures the situation where only the efficiency of the coordination matters. For example, in a research group, the success of a long-run project is usually determined by how efficiently the group members can coordinate with each other and has little to do with the scale of the group.

Let \(x_k(\omega )\) be the number of agents in location \(k\) in state \(\omega \) and \(y_k(\omega )\) be the number of \(P\)-players in location \(k\) in state \(\omega \). Hence, for any state \(\omega =(v_1,v_2,n_1)\), we have \(x_k(\omega )=n_k\) and \(y_k(\omega )=v_k\). Given a state \(\omega \), denote the payoffs of \(P\)-players and \(R\)-players in location \(k\), respectively, by

$$\begin{aligned} \pi ^P_k(\omega )&= \frac{y_k(\omega )-1}{x_k(\omega )-1}\varPi (P,P)+\frac{x_k(\omega )-y_k(\omega )}{x_k(\omega )-1}\varPi (P,R)\\ \pi ^R_k(\omega )&= \frac{y_k(\omega )}{x_k(\omega )-1} \varPi (R,P)+\frac{x_k(\omega )-y_k(\omega )-1}{x_k(\omega )-1}\varPi (R,R) \end{aligned}$$

for \(x_k(\omega )>1\). Note that \(\pi ^P_k(\omega )\) for \(y_k(\omega )=0\) and \(\pi ^R_k(\omega )\) for \(y_k(\omega )=n_k\) are not defined for any \(n_k\). If there is only one player in a location, then the player cannot find a partner to play the game, and we assume that the payoff of this player is zero.Footnote 7

Hence, the average of the payoffs of the agents in location \(k\) in state \(\omega \) is

$$\begin{aligned} \bar{\pi }_k(\omega ) = \frac{y_k(\omega )}{x_k(\omega )} \pi ^P_k(\omega )+\frac{x_k(\omega )-y_k(\omega )}{x_k(\omega )}\pi ^R_k(\omega ) \end{aligned}$$
(5)

for \(x_k(\omega ) > 0\) (\(k=1,2\)). It is natural to assume that, if a location \(k\) is empty, then its payoff is zero; that is, if \(x_k(\omega )=0\), then \(\bar{\pi }_k(\omega )=0\).

The limit invariant distribution \(\mu ^*\) is a function of the capacity constraints. Hence, we have \(\mu ^*(c_1,c_2) \in \varDelta (\varOmega )\), where \(\varDelta (\varOmega )\) is the set of probability distributions over \(\varOmega \). The payoff function of manager \(k\) is defined by the average of payoffs at location \(k=1,2\) in the long run:

$$\begin{aligned} u_k(c_1,c_2)= \sum _{\omega \in \varOmega }\mu ^*(c_1,c_2)(\omega )\bar{\pi }_k(\omega ), \end{aligned}$$
(6)

where \(\mu ^*(c_1,c_2)(\omega )\) is the probability of \(\omega \) in the limit invariant distribution given \(c_1\) and \(c_2\). We introduce an auxiliary quantity \(c^*=2q^*\) if \(\hat{q} \le \bar{q}\) and \(c^*=\frac{2(2q^*-1)}{2q^*-\hat{q}}\) if \(\hat{q} > \bar{q}\). The NE of the one-shot game between the two managers are provided in Theorem 2.

Theorem 2

Let the payoff function of each manager \(k\) be given by (6), and \(c^{*}\) and \(\varPhi (c_k)\) be as defined above.

  1. (a)

    for any \((c_1,c_2)\) such that \(c_1> \varPhi (c_2)\) and \(c_2<c^*\), or \(c_2>\varPhi (c_1)\) and \(c_1<c^*\), or \(c_1<c^*\) and \(c_2<c^*\), there exists an integer \(\bar{N}\) such that for all \(N>\bar{N}\), \((c_1,c_2)\) is a NE;

  2. (b)

    for any \((c_1,c_2)\) such that \(c_1< \varPhi (c_2)\) and \(c_2>c^*\), or \(c_2<\varPhi (c_1)\) and \(c_1>c^*\), or \(c_1>c^*\) and \(c_2>c^*\), there is no NE corresponding to \((c_1,c_2)\).

Theorem 2 says that, for \(N\) large enough, if the objective function of both managers is the average of the payoffs of the agents at their respective locations in the long run, there are two different classes of NE. In one class, both managers will choose relatively small capacities (\(c_1<c^*\) and \(c_2<c^*\)), which leads to the selection of global coordination on the risk-dominant equilibrium. In the other class, one manager will choose a large capacity and the other a small one (\(c_k> \varPhi (c_\ell )\) and \(c_\ell <c^*\), \(k, \ell =1,2, k\ne \ell \)), which leads to coexistence of conventions in the long run. Further, the strategy profiles in which both managers choose large capacities are not NE. Figure 2 illustrates these results.

Fig. 2
figure 2

The NE of stage 1 and their corresponding LRE for large population

Using Fig. 2, we explain the intuition for the result that the strategy profiles stated in \((a)\) are NE. Consider the parameter regions which support coexistence of conventions. The manager of the location coordinating on the Pareto-efficient equilibrium cannot be better off, hence having no incentive to deviate. Given the capacity choice of this manager, the manager in the other location will not deviate either. The reason is that, whatever his choice is, agents in his location will always coordinate on the risk-dominant equilibrium. Similarly, in the area where global coordination on the risk-dominant equilibrium is selected, no manager can increase his payoff by changing the capacity of his location.

Now we explain the reason for the fact that choosing large capacities for both locations (\(c_1>c^*\) and \(c_2>c^*\)) is not stable. If such a situation were to occur, the LRE would be either coexistence of conventions or global coordination on the Pareto-efficient equilibrium. However, the manager in the location with weakly lower payoff would always have an incentive to decrease the capacity of his location, in order to have all the players in his location coordinating on the Pareto-efficient equilibrium.

3 Extensions

This section discusses several extensions of the basic model. The first extension analyzes the capacity choices of the managers when their payoffs are associated with the total payoffs of the agents at their respective locations in the long run. Hence, in this case, the managers have to consider both the efficiency of the coordination and the scales of their locations. In the second extension, the managers are allowed to have a second choice, mobility constraint. As a result, each manager can choose both the number of immobile agents and the capacity constraint at his location. The last extension considers the case where the agents have more than two strategies in pure coordination games.

3.1 When scale matters

In this subsection, we consider the case where the managers are concerned with the total payoff of the players at their locations. For instance, in some evaluation systems for economics departments, the total number of academic publications is regarded as an important factor. Hence, if the dean of a department really cares about the evaluation (or his performance is measured by the evaluation), he may want to both improve the efficiency of the coordination on research among the faculty members and increase the number of researchers in the department.

The total payoff of location \(k\) at state \(\omega \) is given by \(x_k(\omega ) \bar{\pi }_k(\omega )\). We define the payoff function of both managers who care about the total payoffs of the agents at their respective locations by

$$\begin{aligned} u_k(c_1,c_2)= \sum _{\omega \in \varOmega }\mu ^*(c_1,c_2)(\omega )x_k(\omega )\bar{\pi }_k(\omega ). \end{aligned}$$
(7)

Proposition 2

Let the payoff function of each manager \(k\) be given by (7). For any \((c_1,c_2) \in [1,2]^2\setminus \{(c^*,c^*)\}\), there exists an integer \(\bar{N}\) such that for all \(N>\bar{N}\), there is no NE corresponding to \((c_1,c_2)\).

Proposition 2 says that if the managers care about the total payoffs of the agents in their locations in the long run, for \(N\) large enough, the only candidate of the NE of the managers’ game is \((c^*,c^*)\). The reason that the other strategy profiles cannot be NE is the following. In the region where coexistence of conventions is selected, the manager of the location in which agents coordinate on the risk-dominant equilibrium has an incentive to decrease \(c_k\) to the extent that both locations will coordinate on the risk-dominant equilibrium, because it will increase the expected population size in his location. In the region where agents in both locations coordinate on the risk-dominant equilibrium, each manager has an incentive to increase \(c_k\), which may either increase the expected population size or lead his location to coordinate on the Pareto-efficient equilibrium.

The only exception is the point \((c^*,c^*)\). For \(N \rightarrow \infty \), \((c^*,c^*)\) is a NE if the payoff of the Pareto-efficient equilibrium is high enough. The reason is that, given this strategy profile, in each location, both the Pareto-efficient and the risk-dominant equilibria can be selected in the long run. However, any unilateral deviation of the manager will lead his location to coordinate only on the risk-dominant equilibrium. For any fixed \(N\), whether or not this point is a NE crucially depends on the parameter of the model, and we cannot provide a general result.

3.2 Mobility constraints

In this subsection, we consider the optimal policies of the managers when there is an additional choice, mobility constraint. In reality, mobility constraint can be specified in the working contract. For instance, in airline companies, pilots are usually required by contract to work in the same company for a relatively long time (e.g., two decades), and leaving before the stated time always causes a large penalty.

Denote by \(p_k\), the mobility constraint of location \(k\) such that \(\lceil p_k N \rceil \) specifies the number of immobile players in each period.Footnote 8 Since each location has \(N\) agents in the beginning, we assume that each manager can restrict the mobility of at most \(N\) players (i.e., \(p_k \le 1\)). It is natural that manager \(k\) cannot restrict the mobility of the agents not in location \(k\) when making the decision. Then, a strategy of manager \(k\) is a pair \((c_k, p_k) \in [1,2]\times [0,1]\).

Given the capacity and mobility constraints, the effective capacity of location \(k\) can be represented by \(M_k \equiv \lfloor d_kN \rfloor \) with \(d_k=\min \{c_k, 2-p_\ell \}\), and the minimum number of players in location \(k\) is \(m_k \equiv 2N-M_\ell \), for \(k,\ell \in \{1,2\}, k \ne \ell \). That is, a location has the maximum number of agents either by reaching its capacity constraints, or by having all agents, apart from the immobile ones, in this location.

Since \(d_k \in [1,2]\), when replacing \(c_k\) with \(d_k\) \((k=1,2)\) in Lemma 1, the absorbing sets of the unperturbed dynamics stated in this lemma remain true. Further, with the same replacement, the LRE are the same as in Theorem 1 and Proposition 1.

Now we explore the policy choice of each manager. We first consider the case where the payoff function of each manager \(k\) is the average of the agents’ payoffs in his location in the long run.

Proposition 3

Let the payoff function of each manager \(k\) be given by (6), \(\varPhi (d_k)\) be as defined above, and \(d^*=2q^*\) if \(\hat{q} \le \bar{q}\) and \(d^*=\frac{2(2q^*-1)}{2q^*-\hat{q}}\) if \(\hat{q} > \bar{q}\).

  1. (a)

    for any \((d_1,d_2)\) such that \(d_1<d^{*}\) or \(d_2<d^{*}\) there exists an integer \(\bar{N}\) such that for all \(N>\bar{N}\), \((d_1,d_2)\) corresponds to at least one NE, provided that \(d_1 \ne \varPhi (d_2)\) and \(d_2 \ne \varPhi (d_1)\);

  2. (b)

    for any \((d_1,d_2)\) such that \(d_1>d^{*}\) and \(d_2>d^{*}\) there exists an integer \(\bar{N}\) such that for all \(N>\bar{N}\), there is no NE corresponding to \((d_1,d_2)\).

If the managers can choose both the capacity and the mobility constraints, and care about the average of the agents’ payoff in their locations in the long run, the parameter regions in the \((d_1,d_2)\)-space that correspond to NE are similar to the ones when managers can choose capacity constraints only. The difference is that the strategy profiles projected on the two areas \(d^* < d_1 < \varPhi (d_2)\) and \(d^* < d_2 < \varPhi (d_1)\) are now NE. Consider an arbitrary point in the first area. Manager 1 cannot be better off from deviation, because agents in location 1 coordinate on the Pareto-efficient equilibrium. In order to lead his location to coordinate on the Pareto-efficient equilibrium, manager 2 would like to increase \(d_2\) by choosing a higher \(c_2\). However, his effort may not be effective, for \(d_2\) is determined by \(\min \{c_2, 2-p_1\}\). If \(d_2=2-p_1\), increasing \(c_2\) cannot change \(d_2\) at all. A similar argument holds for any point in the second area.

Now we explore the situation where the managers are concerned with the total payoffs of the agents in their respective locations in the long run. Proposition 4 summarizes the result.

Proposition 4

Let the payoff function of each manager \(k\) be given by (7), and \(d^*\) and \(\varPhi (d_k)\) be as defined above.

  1. (a)

    \(d_1=d_2=1\) corresponds to at least one NE;

  2. (b)

    for any \((d_1,d_2) \in [1,2]^2 \setminus (\{(1,1)\} \cup \{(d^{*},d^{*})\})\), there exists an integer \(\bar{N}\) such that for all \(N>\bar{N}\), there is no NE corresponding to \((d_1,d_2)\).

The only difference between this result and Proposition 2 is that now \(d_1=d_2=1\) corresponds to at least one NE. If managers can only choose capacity constraint, \(c_1=c_2=1\) is not a NE, because both managers have an incentive to increase their respective capacities, in order to have a larger expected population size. However, since the managers now can also choose mobility constraint, if they care about the number of agents in their locations, both managers have an incentive to increase the mobility constraint to prevent the agents from moving out. As a result, both managers will set \(p_k=1\) so that no agents can change locations. This result confirms the argument in Anwar (2002, footnote 7).

When the managers can choose both capacity and mobility constraints, Anwar’s (2002) result corresponds to the LRE on the diagonal of the \((d_1,d_2)\)-space (the segment connecting \((1,1)\) and \((2,2)\) but not including \((2,2)\))Footnote 9, and Ely’s (2002) result corresponds to the LRE at \((2,2)\). Hence, Propositions 3 and 4 show that the assumptions on capacity and mobility constraints in Ely (2002) and Anwar (2002) are not the optimal choices of the managers.

3.3 Pure coordination game with more than two strategies

This subsection considers the case where the basic pure coordination game has more than two strategies. Consider the game \(G=\left[ I, \{S^i\}_{i \in I}, \{\pi ^i\}_{i \in I}\right] \), where \(I=\{1,2\}\), \(S^i=\{s_1, s_2, \ldots , s_n\}\), and \(\pi ^i(s^i,s^j)=\tau \) if \(s^i=s^j=s_\tau \), and \(\pi ^i(s^i,s^j)=0\) otherwise. Hence, \(G\) is a \(n\times n\) pure coordination game, where the payoff for mis-coordination is zero. Straightforwardly, any \((s_\tau ,s_\tau )\) for \(\tau \in \{1,2,\ldots ,n\}\) is a strict NE of this game. The model setup remains the same.

Hence, for \(d_1=d_2<2\), the unperturbed learning dynamics has \(n^2\) absorbing sets, in which agents in each location coordinate on a strict NE. For \(d_1=d_2=2\), there are \(2n\) absorbing sets, in which one location is empty and the other location has all the agents coordinating on a strict NE. For \(d_k=2\) and \(d_\ell <2\), the unperturbed dynamics has \(n+n(n-1)/2\) absorbing sets: There are \(n\) absorbing sets where all the agents in location \(k\) coordinate on a strict NE, and there are \(n(n-1)/2\) absorbing sets where agents in location \(k\) coordinate on a less efficient equilibrium than that in location \(\ell \). We single out the LRE by comparing the minimum transition costs toward these absorbing sets and summarize the result in Proposition 5.

Proposition 5

If the basic interaction among agents is the \(n \times n\) coordinate game \(G\) defined above, the LRE of the learning dynamics in stage 2 of the model are as follows.

  1. (a)

    For \(d_1<2\) and \(d_2<2\), the LRE are the states in \(\varOmega (s_n,s_n)\).

  2. (b)

    For \(d_1=2\) and \(d_2<2\), the LRE is the state in \(\varOmega (s_n,0)\). Similarly, for \(d_1<2\) and \(d_2=2\), the LRE is the state in \(\varOmega (0,s_n)\).

  3. (c)

    For \(d_1=2\) and \(d_2=2\), the LRE are the states in \(\varOmega (s_n,0)\) and \(\varOmega (0,s_n)\),

where \(\varOmega (s_n,s_n)\) is the absorbing set where agents in both locations play \(s_n\), and \(\varOmega (s_n,0)\) (\(\varOmega \mathrm{(}0,s_n\mathrm{)}\)) is the absorbing set where all the agents agglomerate in location 1 (2) and play \(s_n\).

Proposition 5 says that only the strict NE \((s_n,s_n)\) will be selected in the long run. The reason is that \((s_n,s_n)\) is both Pareto-efficient and risk-dominant and hence has a larger basin of attraction comparing with any other strict NE. That is, given a location, the transition from any other strict NE to \((s_n,s_n)\) always requires less mutants than the one in the opposite direction.

Then, we consider the managers’ game in stage 1. We focus on the case where the managers are concerned with the total payoffs of their respective locations in the long run,Footnote 10 and explore two alternative strategy sets for managers.

Proposition 6

Let the payoff function of each manager \(k\) be given by (7). The NE of the managers’ game in stage 1 are as follows:

  1. (a)

    If both managers can only choose capacity constraint, \(c_1=c_2=2\) corresponds to the unique NE.

  2. (b)

    If both managers can choose the capacity and the mobility constraints, \(d_1=d_2=1\) corresponds to at least one NE.

The intuition is that, if the managers can choose capacity constraints only, they always have an incentive to increase the capacities of their respective locations. As a result, each location becomes large enough to accommodate all the agents. If the managers can also choose mobility constraint, they have an incentive to increase the capacity and to restrict the mobility. However, since the effective capacity of location \(k\) is determined by \(\min \{c_k, 2-p_\ell \}\), the effect of restricting mobility will dominate that of increasing capacity. In the end, managers will completely forbid migration. Regardless of the choices available to the managers, given the basic interaction \(G\), agents will always coordinate on the Pareto-efficient NE \((s_n,s_n)\).

4 Conclusion

In this paper, we develop a simple model to study long-run technology choice with endogenous local capacity constraints. We introduce a manager for each location and allow them to choose capacity constraints (and mobility constraints). Given these constraints, the agents are involved in a learning dynamics of a coordination game. We assume that the managers are rational, while the agents are boundedly rational, which gives rise to a model of “asymmetric rationality.” The managers make decisions first, with perfect knowledge of the effect of different policies on the long-run consequences, and the agents take the constraints as given and choose strategies of a coordination game repeatedly. To our knowledge, this is one of the few studies that explore the sequential interaction among agents with different rationality levels within the framework of stochastic learning in games.

Clearly, the strategy set and the objective function of the managers have significant effects on the equilibrium selection of coordination games in the long run. Within our framework, we consider two alternative strategy sets and investigate two different objective functions. If the managers care about efficiency, a set of NE exist, leading to either the coexistence of conventions or the global coordination of the risk-dominant equilibrium. If the managers also care about the scale, they may end up with a medium capacity, when they can choose capacity constraint only, and they may completely forbid migration of the agents, when they are also allowed to choose mobility constraint. In either case, a set of symmetric policy arrangements exogenously given in Ely (2002) and Anwar (2002) are not stable. Hence, our work tests the validity of the assumptions of policy constraints that have been considered in the related literature and also demonstrates how policy adjustments may change the long-run outcomes.

There are many situations in social and economic activities where agents with different rationality levels interact with each other. Hence, in our opinion, further research should focus on developing more realistic models to analyze such interactions in different contexts. A deeper understanding of these issues will allow us to obtain better insights into the consequences of the interactions among heterogenous agents. This paper takes a step forward by illustrating how the choices of capacity constraints interact with the long-run technology convention in a non-trivial way; hence, it is necessary to explicitly treat policy parameters and their optimality as important factors for the establishment of social conventions in an organized society.