Keywords

JEL Classifications

A growing literature in economics examines the formation of networks and complements a rich literature in sociology and recently emerging literatures in computer science and statistical physics. Research on network formation is generally motivated by the observation that social structure is important in a wide range of interactions, including the buying and selling of many goods and services, the transmission of job information, decisions on whether to undertake criminal activity, and informal insurance networks.

Networks are often modelled using tools and terminology from graph theory. Most models of networks view a network as either a non-directed or a directed graph; which type of graph is more appropriate depends on the context. For instance, if a network is a social network of people and links represent friendships or acquaintances, then it would tend to be non-directed. Here the people would be modelled as the nodes of the network and the relationships would be the links. (In terms of a graph, the people would be vertices and the relationships would be edges.) If, instead, the network represents citations from one article to another, then each article would be a node and the links would be directed, as one article could cite another. While many social and economic relationships are reciprocal or require the consent of both parties, there are also enough applications that take a directed form, so that both non-directed and directed graphs are useful as modelling tools.

Models of how networks form can be roughly divided into two classes. One derives from random graph theory, and views an economic or social relationship as a random variable. The other views the people (or firms or other actors involved) as exercising discretion in forming their relationships, and uses game theoretic tools to model formation. Each of these techniques is discussed in turn.

Models of Random Networks

Bernoulli Random Graphs

Some of the earliest formal models used to understand the formation of networks are random graphs: the canonical example is that of a pure Bernoulli process of link formation (for example, see the seminal study of Erdös and Rényi 1960). For instance, consider a network where the (non-directed) link between any two nodes is formed with some probability p (where 1 > p > 0), and this process occurs independently across pairs of nodes. While such a random method of forming links allows any network to potentially emerge, some networks are much more likely to do so than others. Moreover, as the number of nodes becomes large, there is much that can be deduced about the structure the network is likely to take, as a function of p. For instance, one can examine the probability that the resulting network will be connected in the sense that one can find a path (sequence of links) leading from any given node to any other node. We can also ask what the average distance will be in terms of path length between different nodes, among other things. As Erdös and Rényi showed, such a random graph exhibits a number of ‘phase’ transitions as the probability of forming links, p, is varied in relation to the number of nodes, n; that is, resulting networks exhibit different characteristics depending on the relative sizes of p and n.

Whether or not such a uniformly random graph model is a good fit as a model of network formation, it is of interest because it indicates that networks with different densities of links might tend to have very different structures and also provides some comparisons for network formation processes more generally. Some of the basic properties that such a random graph exhibits can be summarized as follows. When p is small in relation to n, so that p < 1/n (that is, the average number of links per node is less than one), then with a probability approaching 1 as n grows the resulting graph consists of a number of disjointed and relatively small components, each of which has a tree-like structure. (A component of a network is a subgraph, so that each node in the subgraph can be reached from any other node in the subgraph via a path that lies entirely in the subgraph, and there are no links between any nodes in the subgraph and any nodes outside the subgraph.) Once p is large enough in relation to n, so that p > 1/n, then a single ‘giant component’ emerges; that is, with a probability approaching 1 the graph consists of one large component, which contains a nontrivial fraction of the nodes, and all other components are vanishingly small in comparison. Why there is just one giant component and all other components are of a much smaller order is fairly intuitive. In order to have two ‘large’ components each having a nontrivial fraction of n nodes, there would have to be no links between any node in one of the components and any node in the other. For large n, it becomes increasingly unlikely to have two large components with absolutely no links between them. Thus, nontrivial components mesh into a giant component, and any other components must be of a much smaller order. As p is increased further, there is another phase transition when p is proportional to log(n)/n. This is the threshold at which the network becomes ‘connected’ so that all nodes are path-connected to each other and the network consists of a single component. Once we hit the threshold at which the network becomes connected, we also see further changes in the diameter of the network as we continue to increase p relative to n. (The diameter is the maximal distance between two nodes, where distance is the minimal number of links that are needed to pass from one node to another.) Below the threshold, the diameter of a giant component is of the order of log(n), then at the threshold of connectedness it hits log(n)/loglog(n), and it continues to shrink as p increases.

Similar properties and phase transitions have been studied in the context of other models of random graphs. For example, Molloy and Reed (1995), among others (see Newman 2003), have studied component size and connectedness in a ‘configuration model’. There, a set of nodes is given together with the number of links that each node should have, and then links are randomly formed to leave each node with the pre-specified number of links.

Clustering and Markov Graphs

Although the random graphs of Erdös and Rényi are a useful starting point for modelling network formation, they lack many characteristics observed in most social and economic networks. This has led to a series of richer random graph-based models of networks. The most basic property that is absent from such random networks is that the presence of links tends to be correlated. For instance, social networks tend to exhibit significant clustering. Clustering refers to the following property of a network. If we examine triples of nodes so that two of them are each connected to the third, what is the frequency with which those two nodes are linked to each other? This tends to be much larger in real social networks than one would see in a Bernoulli random graph. On an intuitive level, models of network formation where links are formed independently tend to look too much like ‘trees’, while observed social and economic networks tend to exhibit substantial clustering, with many more cycles than would be generated at random (see Watts 1999, for discussion and evidence).

Frank and Strauss (1986) identified a class of random graphs that generalize Bernoulli random graphs, which they called ‘Markov graphs’ (also referred to as p* networks). Their idea was to allow the chance that a given link forms to be dependent on whether or not neighbouring links are formed. Specific interdependencies require special structures, because, for instance, making one link dependent on a second, and the second on the third, can imply some interdependencies between the first and third. These sorts of dependencies are difficult to analyse in a tractable manner, but nevertheless some special versions of such models have been useful in statistical estimation of networks.

Small Worlds

Another variation on a Bernoulli network was explored by Watts and Strogatz (1998) in order to generate networks that exhibit both relatively low distances (in terms of minimum path length) between nodes and relatively high clustering – two features that are present in many observed networks but not in the Bernoulli random graphs unless the number of links per node (p(n − 1)) is extremely high. They started with a very structured network that exhibits a high degree of clustering. Then, by randomly rewiring enough (but not too many) links, one ends up with a network that has a small average distance between links but still has substantial clustering. While such a rewiring process results in networks that exhibit some of the features of social networks, it leads to networks that miss out on other basic characteristics that are present in many social networks. For example, the nodes of such a network tend to be too similar in terms of the number of links that they each have.

Degree Distributions

One fundamental characteristic of a social network is a network’s degree distribution. The degree of a node is the number of links it has, and the degree distribution keeps track of how varied the degree is across the nodes of the network. That is, the degree distribution is simply the frequency distribution of degrees across nodes. For instance, in a friendship network some individuals might have only a few friends while other individuals might have many, and then the degree distribution quantifies this information.

Price (1965) examined a network of citations (between scientific articles), and found that the degree distribution exhibited ‘fat tails’ compared with what one would observe in a Bernoulli random graph; that is, there was a higher frequency of articles that had many citations and a higher frequency of articles that had no citations than should be observed if citations were generated independently. In fact, many social networks exhibit such fat tails, and some have even been thought to exhibit what is known as a ‘scale-free’ degree distribution or said to ‘follow a power law’. A scale-free distribution is one where the frequency of degrees can be written in the form f(d) = adb, for some parameters a and b, where d is the degree and f(d) is the relative frequency of nodes with degree d. Such distributions date to Pareto (1896), and have been observed in a variety of other contexts ranging from the distribution of wealth in a society to the relative use of words in a language. Price (1976) adapted ideas from Simon (1955) to develop a random link formation process that produces networks with such degree distributions. A similar model was later studied by Barabási and Albert (2001), who called the process of link formation ‘preferential attachment’. The idea is that nodes gain new links with probabilities that are proportional to the number of links they already have (which is closely related to a lognormal growth process). In a system where new nodes are born over time, this process generates scale-free degree distributions.

A simple preferential attachment model also has its limitations. One is that most social networks do not in fact have degree distributions that are scale-free. Observed degree distributions tend to lie somewhere between the extremes of a scale-free distribution and that corresponding to an independent Bernoulli random graph (sometimes known as a Poisson random graph for its approximate degree distribution). Second, the preferential attachment model fails to produce the type of clustering observed in many social networks, just as Bernoulli random graphs do. This has led to the construction of hybrid models that allow for richer sets of degree distributions, as well as clustering and correlation in degrees, and allows for the structural fitting of random graph based network formation models to data (for example, see Jackson and Rogers 2007, and the discussion there).

Strategic Models of Network Formation

Strategic models of network formation have emerged from the economics literature, and offer a very different perspective from that seen in random graph models, and a complementary set of insights (see Jackson 2006, for comparison and discussion). The starting point for a game theoretic approach is to assume that the nodes are active discretionary agents or players who get payoffs that depend on the social network that emerges. For example, if nodes are countries and links are political alliances, or nodes are firms and links are trading or collaboration agreements, then the relationships are entered into with some care and thought. Even in modelling something like a friendship network, while individuals might not be directly calculating costs and benefits from the relationship, they do react to how enjoyable or worthwhile the relationship is and might tend to spend more effort or time in relationships that are more beneficial and avoid ones that are less so. Different social networks lead to different outcomes for the involved agents (for example, different trades, different access to information or favours, and so on). Links are then formed at the discretion of the agents, and various equilibrium notions are used to predict which networks will form. This differs from the random models not only in that links result as a function of decisions rather than at random, but also in that there are natural costs and benefits associated with networks which then allow a welfare analysis.

Some of the first models to bring explicit utilities and choice to the formation of social links were in the context of modelling the trade-offs between ‘strong’ and ‘weak’ ties (links) in labour contact networks. Such models by Boorman (1975) and Montgomery (1991) explored a theory, due to Granovetter (1973), about different strengths of social relationships and their role in finding employment. Granovetter observed that when individuals obtained jobs through their social contacts, while they sometimes did so through strong ties (people whom they knew well and interacted with on a frequent basis), they also quite often obtained jobs through weak ties (acquaintances whom they knew less well and/or interacted with relatively infrequently). This led Granovetter to coin the phrase ‘the strength of weak ties’. Boorman’s article and Montgomery’s articles provided explicit models where costs and benefits could be assigned to strong and weak ties, and trade-offs between them could be explored.

In a very different setting, another use of utility functions involving networks emerged in the work of Myerson (1977). Myerson analysed a class of cooperative games that were augmented with a graph structure. In these games the only coalitions that could produce value are those that are pathwise connected by the graph, and so such graphs indicate the possible cooperation or communication structures. This approach led Myerson to characterize a variation on the Shapley value, now called the Myerson value, which was a cooperative game solution concept for the class of cooperative games where constraints on coalitions were imposed by a graph structure. Although the graphs in Myerson’s analysis are tools to define a special class of cooperative games, they allow the graph structure to influence the allocation of societal value among a set of players. Aumann and Myerson (1988), recognizing that different graph structures led to different allocations of value, used this to study a game where the graph structure was endogenous. They studied an extensive form game where links are considered one by one according to some exogenous order, and formed if both agents involved agree. While that game turns out to be hard to analyse even in three-person examples, it was an important precursor to the more recent economic literature on network formation.

In contrast to the cooperative game setting, Jackson and Wolinsky (1996) explicitly considered networks, rather than coalitions, as the primitive. Thus, rather than deducing utilities indirectly through a cooperative game on a graph, they posited that networks were the primitive structure and agents derived utilities based on the network structure in place. So, once a social network structure is in place, one can then deduce what the agent’s payoffs will be. Using such a formulation where players’ payoffs are determined as a function of the social network in place, it is easy to model network formation using game theoretic techniques.

Pairwise Stability

In modelling network formation from a game theoretic perspective, one needs to have some notion of equilibrium or stable networks. Since it is natural to require mutual consent in many applications, standard Nash equilibrium based ideas are not very useful. For instance, consider a game where each agent simultaneously announces which other agents he or she is willing to link to. It is always a Nash equilibrium for each agent to say that he or she does not want to form any links, anticipating that the others will do the same. Generally, this allows for a multiplicity of equilibria, many of which make little sense from a social network perspective. Even equilibrium refinements (such as undominated Nash or perfect equilibrium) do not avoid this problem. Given that it is natural in a network setting for the agents prospectively forming a link to be able to communicate with each other, they should also be able to coordinate with each other on the forming of a link. An approach taken by Jackson and Wolinsky (1996) is to define a stability notion that directly incorporates the mutual consent needed to form links. Jackson and Wolinsky (1996) defined the following notion of ‘pairwise stability’: a network is pairwise stable if (i) no player would be better off if he or she severed one of his or her links, and (ii) no pair of players would both benefit (with at least one of the pair seeing a strict benefit) from adding a link that is not in the network. The requirement that no player wishes to delete a link that he or she is involved in implies that a player has the discretion to unilaterally terminate relationships that he or she is involved in. The second part of the definition captures the idea that if we are at a network where the creation of a new link would benefit both players involved, then the network g is not stable, as it will be in the players’ interests to add the link.

Pairwise stability is a fairly permissive stability concept – for instance, it does not consider deviations where players delete some links and add others at the same time. While pairwise stability is easy to work with and often makes fairly pointed predictions, the consideration of further refinements can make a difference. A variety of refinements and alternative notions have been introduced, including allowing agents to form and sever links at the same time, allowing coalitions of agents to add and sever links in a coordinated fashion, or behaviour where agents anticipate how the formation of one link might influence others to form further links (see Jackson 2004, for discussion and references). There are also dynamic models (for example, Watts 2001) in which the possibility of forming links arises (repeatedly) over time, and agents might ‘tremble’ when they form links (see Jackson 2004, for references). These various equilibrium/stability concepts have different properties and are appropriate in different contexts.

With pairwise stability, or some other solution in hand, one can address a series of questions. One fundamental question is whether, from society’s point of view, efficient or optimal networks will be stable when agents form links with their selfish interests in mind. Given that transfers are being considered here, one natural definition of an ‘efficient’ or ‘optimal’ network is one that maximizes the total value or the sum of utilities of all agents in the society. Another basic question is to ask whether in situations where no efficient network is pairwise stable, is it possible for some sort of intervention (for example, in the form of taxing or subsidizing links), to lead efficient networks to form.

A Connections Model of Social Networks

One stylized example from Jackson and Wolinsky (1996) gives some feeling for the issues involved in the above questions and is useful for illustrating the relationship between efficient and pairwise stable networks. Jackson and Wolinsky called this example the ‘symmetric connections model’, in which the links represent social relationships between players such as friendships. These relationships offer benefits in terms of favours, information, and so on, and also involve some costs. Moreover, players benefit from having indirect relationships. A ‘friend of a friend’ produces benefits or utility for a player, although of a lesser value than the direct benefits that come from a ‘friend’. The same is true of ‘friends of a friend of a friend’, and so forth. Benefit deteriorates in the ‘distance’ of the relationship, as represented by a factor δ between 0 and 1, which indicates the benefit from a direct relationship between two agents and is raised to higher powers for more distant relationships. For instance, in the network where player 1 is linked to 2, 2 is linked to 3, and 3 is linked to 4; player 1 gets a benefit of δ from the direct connection with player 2, an indirect benefit of δ2 from the indirect connection with player 3, and an indirect benefit of δ3 from the indirect connection with player 4. For δ < 1 this leads to a lower benefit from an indirect connection than a direct one. Players also pay some cost c for maintaining each of their direct relationships (but not for indirect ones). Once the benefit parameter, δ, and the cost parameter, c > 0 are specified, it is possible to determine each agent’s payoff from every possible network, allowing a characterization of the pairwise stable networks as well as the efficient networks. The efficient network structures are the complete network if c < δδ2, a ‘star’ (a network where one agent is connected to each other agent and there are no other connections) encompassing all nodes if \( \delta -{\delta}^2<c<\delta +\frac{\left(n-2\right)}{2}{\delta}^2 \), and the empty network if \( \delta +\frac{\left(n-2\right)}{2}{\delta}^2<c \). The idea is that if costs are very low it will be efficient to include all links in the network, because shortening any path leads to higher payoffs. When the link cost is at an intermediate level, then the unique efficient network structure is to have all players arranged in a star network, since such a structure has the minimal number of links (n − 1) needed to connect all individuals, and yet still has all nodes within at most two links from one another. Once links become so costly that a star results in more cost than benefit, then the empty network is efficient. One can also examine a directed version of such a model, as in Bala and Goyal (2000), who find related results, but with some differences that depend on whether both agents or just one of the agents enjoys the benefits from a directed link.

Inefficiency of Stable Networks

The set of pairwise stable networks does not always coincide with the efficient ones, and sometimes do not even intersect with the set of efficient networks. For instance, if the cost of a link is greater than the direct benefit (c > δ), then relationships are only valuable to a given agent if they generate indirect benefits as well as direct ones. In such a situation a star is not pairwise stable since the centre player gets benefit of the direct value from each of his or her links, which is less than the cost of each of those links. This model of social networks makes it obvious that there will be situations where individual incentives are not aligned with overall societal benefits.

As it will generally be the case that in economic and social networks there are some sort of externalities present, since two agents’ decisions of whether or not to form a relationship can affect the well-being of other agents, one should expect that there will be situations where the networks formed through the selfish decisions of the agents do not coincide with those that are efficient from society’s perspective. In such situations, it is natural to ask whether intervention in the form of transfers among agents might help align individual and overall societal incentives to form the right network. For instance, in the connections model, it would make sense to have the peripheral agents in a star pay the centre of the star in order to maintain their links. The peripheral agents benefit much more from the relationship with the centre agent than vice versa, as the centre agent provides access to many indirect agents. Although a simple set of transfers can align individual and overall incentives in the connections model, it is impossible to always correct this tension between individual incentives and overall efficiency by taxing and subsidizing agents for the links they form (even in a complete information setting). The fact that there are very simple, natural network settings where no ‘reasonable’ set of transfers can help rectify the disparity stability and efficiency was shown in Jackson and Wolinsky (1996). Without providing details, the impossibility of reconciling stability and efficiency stems from the following considerations: from any given network, there are many other networks that can be reached. In fact, if there are n nodes, then there are n (n − 1)/2 possible links that can be added to or deleted from any given network. In order to ensure that a given efficient network is pairwise stable, payoffs to all neighbouring networks have to be configured so that no agent finds it in his or her interest to delete a link and no two agents find it in their interests to add a link. It is impossible to assign all the necessary taxes and subsidies in such a way that (i) the transfers are feasible (and are not given to unattached agents), (ii) identical agents are treated identically, and (iii) it is always the case that at least one efficient network is pairwise stable.

Much more has been learned about the relationship between stable and efficient networks and possible transfers to ensure that efficient networks form. For instance, one can characterize some classes of settings where the efficient networks and the stable ones coincide (see Jackson and Wolinsky 1996). One can also design transfers that ensure that some efficient network is stable by treating agents unequally (for example, taxing or subsidizing them differently even though the agents are identical in the problem as shown by Dutta and Mutuswami 1997). Another important point was made by Currarini and Morelli (2000), who showed that if agents bargain over the division of payoffs generated by network relationships at the time when they form link, then in a nontrivial class of settings equilibrium networks are efficient. While the conclusions hinge on the structure of the link-formation-bargaining game, and in particular on an asymmetry in bargaining power across the agents, such a result tells us that it can be important to model the formation of the links of a network together with any potential bargaining over payoffs or transfers. Further study in this area shows how the types of transfers needed to reach efficient networks relate to the types of network externalities that are present in the setting.

Small Worlds and Strategic Network Formation

Beyond understanding the relationship between stable and efficient networks, strategic models of network formation have also shed light on some empirical regularities and helped predict which networks will arise in settings of particular interest. For instance, strategic models of network formation provide substantial insight into the ‘small-worlds’ properties of social networks: the simultaneous presence of high clustering (a high density of links on a local level) and short average path length between nodes (see Jackson 2006, for references). The reasoning is based on a premise that different nodes have different distances from each other, either geographically or according to some other characteristic, such as profession, tastes, and so on. The low cost of forming links to other nodes that are nearby then naturally explains high clustering. High benefits from forming links that bridge disparate parts of the network, due to the access and indirect connections that they bring, naturally explain low average path length.

Networks and Markets

There is a rich set of studies of markets and networks from an economics perspective, including models that explicitly examine whether or not buyers and sellers have incentives to form an efficient network of relationships (for example, Kranton and Minehart 2001). The incentives to form efficient networks depend on the setting and which agents bear the cost of forming relationships. In some settings competitive forces lead to the right configuration of links, and in others buyers and sellers over-connect in order to improve their relative bargaining positions. Other studies focus on the context of specific markets, such as labour markets, where people benefit from connections with neighbours who provide information about job opportunities (see Ioannides and Loury 2004, for an overview and references).

In addition to studies of networks of relationships between buyers and sellers, firms also form relationships amongst themselves that affect their costs and the sets of products they offer. Such oligopoly settings where network formation is important (see Bloch 2004, for a recent survey), again provide a rich set of results regarding the structure of networks that emerge, and contrasts between settings where efficient networks naturally emerge and others where only inefficient networks are formed.

Network formation has also been studied in the context of many other applications, including risk-sharing in developing countries, social mobility, criminal activity, international trade and banking deposits.

Finally, there have been a number of experiments on network formation, using human subjects. These examine a variety of questions, ranging from how forward-looking agents are when they form social ties, to whether or not agents overcome coordination problems when forming links, to whether there are pronounced differences between network formation when links can be formed unilaterally as opposed to when they require mutual consent, to whether efficient networks will tend to result and how that depends on symmetries or asymmetries in the efficient network structure (see Falk and Kosfeld 2003, for some discussion and references).

See Also