Keywords

1 Introduction

A large body of research aims to detect the spread of something through a social network. Be it voting behavior [18], news [12], new technology [8], disease [5], or patronage benefits [7], social networks are thought to matter because their ties—the relationships connecting pairs of people—serve as conduits that can spread ideas, information, goods, germs, and social judgment.

When researchers conduct an empirical test of this kind of spread, they collect a measure of the relevant social network [13]. Because “social relationship” is nebulous, the standard approach is to measure a few concrete types of social interactions. For instance, [8] use surveys to inquire about respondents’ friends, family, potential money lenders, and potential problem solvers. Jennifer et al. [12] ask respondents about seven types of interactions, including visits to others’ homesteads and shared meals. Banerjee [4] as about twelve, including borrowing rice and kerosene.

With multiple measures of social ties in hand, researchers typically take the union of these ties to construct one social network to be used for their analyses [3, 4, 8, 11, 12]. A tie in this aggregate network indicates that one or both of the tied individuals (depending on the coding rule the authors employ) reported having at least one of the measured relationships with the other. The logic that drives researchers to aggregate all measured ties is straightforward: each additional type of interaction contains weakly more information about the social relationship between two people, so the more interactions included, the better the measure of the social network.

However, we argue that this logic does not necessarily hold for all instances of something spreading through a network (“diffusion”). Some things may indeed diffuse generically, in which case one type of tie is as good as another for their spread. On the other hand, things may diffuse specifically along only certain kinds of ties. Take, for instance, the diffusion of useful, factual information, such as that a university is likely to declare a snow day tomorrow. We may expect this information to spread equally well along a tie that indicates shared membership in the rugby club and along a tie that indicates a twelve-year friendship. Now consider the diffusion of more sensitive information, such as that a student is considering reporting a professor for misconduct. In the latter case, students may opt to use their more trusted ties exclusively; one type of social tie would not be as good as another.

In short, ties that indicate a different kind of interaction between two people may also work differently to spread something, depending on the context. When diffusion is tie-specific, aggregating ties effectively adds measurement error to the network which, in an analogue to the regression context, can attenuate and mask true effects.

We begin by showing that different measures of social interactions among the same group of people can in fact pick up quite distinct views of the social network, using data from [12]. Then we demonstrate the problem of aggregating ties using a set of Erdős-Rényi random networks that vary in key ways. We show that when diffusion is specific, aggregating networks substantially attenuates the estimate of true effects. We further show that whether aggregation problems will arise depends on two factors: the amount of new information that that a candidate new tie type would add—a property we call “network overlap”—and the relative sizes of the networks being aggregated.Footnote 1

Our results highlight the importance of careful theory and knowledge of context to underpin a network study. Determining whether the diffusion at hand is likely to be specific or generic relies heavily on theory and a deep qualitative understanding of local context. Without a guiding understanding of how something should, in principle, pass from person to person, researchers risk failing to detect it empirically.Footnote 2

2 Empirical Studies of Social Network Diffusion

Scholars are interested in “network effects” when they believe ties in a network can transmit something. In principle, theory should dictate precisely which ties are of interest. When theory is strong and the tie of theoretical interest has one clear operationalization, researchers are able to measure only a single type of tie. For instance, to study peer influence in adopting insurance among rice farmers, [6] measure one link type: household heads’ close friends with whom they most frequently discuss rice production or other financial issues. To examine the advantage conferred to politicians by familial ties, [7] gather data exclusively on marriage connections between families in the Philippines.

Measuring a single tie type is the exception rather than the rule. When theory is not strong enough to identify one precise tie type that should be interest, or when a single type of tie in concept does not have a single obvious operationalization, researchers collect data on multiple types of ties.

For instance, in their study of farmers’ decisions about sunflower crops in Mozambique, [3] aim to detect a relationship between one’s peers’ decisions and one’s own by recording respondents’ family members, friends, and neighbors. To study whether peers affect one’s choice to give deworming medication to children in Kenya, [11] ask respondents to name the five friends and the five relatives they speak to most frequently, other social contacts whose children attend the local schools, and people with whom they speak about child health matters.

To learn about how social information spreads through Indonesian hamlets, [1] record both blood relatives and shared membership in social organizations. Jennifer et al. [12] measure the channels through which news may spread from person to person by asking Ugandan villagers about people with whom they spend time, share meals, exchange household visits, discuss religion, discuss politics, share secrets, and speak on the phone. Banerjee [4] measure twelve types of ties that may be responsible for spreading the word about a microfinance program: people who visit the respondent’s home, people the respondent visits, kin, non-relatives with whom the respondent socializes, sources of borrowed money, sources of borrowed material goods, potential recipients of lent money, potential recipients of lent material goods, sources of medical advice, sources of general advice, receivers of advice, and prayer partners. Ferrali et al. [8] operationalize the ties that may spread information about new technology with friendship, family, potential money lenders, and potential solvers of problems about public services.

In all of the above cases of measuring multiple types of ties, the researchers take the union of the different types of ties to construct a measure of the “social network.”Footnote 3 Doing so is the standard approach. At first blush, aggregating ties in this way seems justified since the resulting network contains the data’s maximum information about social ties.

Moreover, a large literature has made clear that social relationships are complex. A relationship between two people can manifest in many ways, featuring a variety of kind of interactions. This “multiplexity” is a key feature of human networks (see [9, 16]). In this view, social ties are multi-faceted and so a person who is my friend may also be my coworker as well as the person with whom I would discuss politics. We might even say that this relationship is especially rich because it features these three dimensions.

The difficulty arises when we flip this logic and look for evidence of a social tie by measuring these dimensions separately. Of course if everyone in a group of interest who are friends are also coworkers and are also political discusison partners, then there is no issue. Measuring one is the same as measuring all and taking their union. However, as we show below, when we pick a single dimension of a relationship—just political discussion partners, say—the network formed by those ties alone can look quite different from a network formed by a different dimension—working together, for instance. Given that different dimensions can add different information, it is worth considering whether aggregating across the different dimensions is always best.

To preview our argument, consider the setting of [12] in which villagers can spread news to one another, including news that the researchers inserted into the network about an upcoming local event that would give out soap. Suppose that the true way that villagers pass on this kind of news is by telling anyone with whom they have any kind of social relationship. In such a case, measuring a variety of different social relationships and aggregating them would indeed maximize the relevant information.

Now suppose instead that villagers find events hosted by outsiders to be a political matter, worthy of discussion with one’s politically-minded social ties and no others. Or perhaps villagers are concerned about their safety at an unusual event and will only discuss it with their elders. Different still, imagine that villagers fear that their participation, if discovered, would offend the local political elites, and so only spread word to their most trusted contacts. In each of these three scenarios, the true spread of information about the soap event only occurs along a specific kind of social tie—political discussion partners, elders, and the most trusted, respectively.

When this is the case, aggregating different kinds of ties can undermine the detection of network effects. In fact, by including the irrelevant information, the data may no longer reveal the true spread through the network.

3 Different Tie Types Generate Different Networks

To see that different measures of social ties can contain quite different information, consider the social network from one of the Ugandan villages in [12]. Seven different types of relationships are measured among the villagers by survey. Figure 1 highlights four of them. Each villager is asked to name other villagers with whom they regularly spend time (top left), discuss politics (top right), share secrets with (bottom left), and speak to on the phone (bottom right).

Fig. 1.
figure 1

Different structural characteristics are apparent when social ties are measured with different kinds of interactions among the same people. Data from [12].

Although these questions are asked of the same villagers, each of the four dimensions of their social relationships looks very different when visualized as its own network. Taking their union to construct a single social network results in each dimension adding a different amount and type of information. If all of these relationships function the same for something spreading from villager to villager, then aggregation needs no further thought; they all provide a view of the same thing. But if diffusion in fact only occurs along ties based on a particular relationship (or even a strict subset of the relationships), then aggregation can mask our ability to detect it.

4 Theory

Suppose we are interested in evaluating whether peer effects play a role in diffusing a given piece of information in a rural village in Uganda. We hypothesize that once supplied to a random selection of “seed” villagers, the information will spread through each of these seed’s contacts to the contacts of their contacts and so on. Contacts may vary in the type of tie that binds them. For example, contacts with whom an ego shares meals may differ from contacts with whom she works. In order to capture this heterogeneity suppose we measure two types of ties, ties of type A and ties of type B, the sets of which we denote \(S_A\) and \(S_B\) respectively. Some time after seeding the information we survey (non-seeded) villagers and inquire whether they have knowledge of the seeded information.

There is evidence of peer-effects if the likelihood of having knowledge of the information is positively correlated with the proportion of an ego’s neighbors that were seeded. To evaluate this relationship we first need to determine which ties are relevant to quantifying the proportion of an ego’s neighbors that were seeded. Ties of type A, of type B or both?

Suppose our seeded information did diffuse through the village’s underlying social networks but only through ties of type A. In this case only the presence of a tie of type A between any two villagers in our sample is informative of peer effects. Taking the union of \(S_A\) and \(S_B\) is equivalent to introducing noise into our covariate measure. Given this equivalence we can appeal to the extensive literature on covariate measurement error to enumerate the consequences of aggregating networks in the presence of a tie-specific diffusion process. These are:

  1. 1.

    Attenuation bias: coefficient estimates are biased toward zero.

  2. 2.

    Downward biased test statistics: resulting in a higher probability of falsely failing to reject the null (type-II error).

The severity of these effects will depend on the true magnitude of the coefficient and the noise-to-signal ratio.Footnote 4 Unlike traditional covariate measurement error however—over which the researcher often has no control and little information—error resulting from mistakenly aggregating networks can be both avoided and quantified. Denote \(\eta \) as the noise-to-signal ratio and \(n(\cdot )\) as the cardinality of a set. Returning to our hypothetical example we can show that:

$$\begin{aligned} \eta _{AB} = \frac{n(S_B - S_A)}{n(S_A)} \end{aligned}$$
(1)

That is, the noise-to-signal ratio is equivalent to the number of ties in \(S_B\) not in \(S_A\) as a proportion of the number of ties in \(S_A\). Notice, generally \(\eta _{AB} \ne \eta _{BA}\), hence the subscript. It is useful to decompose \(\eta _{AB}\) into two components which we label size- and overlap-ratio. We define these as follows:

Definition 1

Given two types of ties A and B, with respective sets \(S_A\) and \(S_B\), define the size-ratio of \(S_A\) to \(S_B\), as the ratio of the number of ties in \(S_B\) and number of ties in \(S_A\). Formally:

$$\begin{aligned} s_{AB} = \frac{n(S_B)}{n(S_A)} \end{aligned}$$
(2)

Definition 2

Given two types of ties A and B, with respective sets \(S_A\) and \(S_B\), define the overlap-ratio of \(S_A\) to \(S_B\) as the proportion of ties in \(S_A\) also found in \(S_B\). Formally:

$$\begin{aligned} o_{AB} = \frac{n(S_A \cap S_B)}{n(S_A)} \end{aligned}$$
(3)

As with \(\eta _{AB}\), neither ratio is symmetric with respect to its arguments hence the subscripts. It can be shown that \(\eta _{AB}\) is a function of these two ratios. Specifically,

$$\begin{aligned} \eta _{AB} = \frac{n(S_B - S_A)}{n(S_A)} = \frac{n(S_B) - n(S_A \cap S_B)}{n(S_A)} = \underbrace{\frac{n(S_B)}{n(S_A)}}_{\text {size-ratio}} - \overbrace{\frac{n(S_A \cap S_B)}{n(S_A)}}^{\text {overlap-ratio}}= s_{AB} - o_{AB} \end{aligned}$$
(4)

All else equal, the larger the size-ratio, the more noise we are adding by taking the union. However, some of the ties in \(S_B\) may also be found in \(S_A\). These ties do not add any noise.

5 Simulated Networks

In this section we illustrate the network aggregation problem using simulated random networks. Specifically we generate hypothetical networks with two types of ties, type A and type B, with varying size- and overlap-ratios. For a given population of nodes of size N,

  1. 1.

    Generate a list L of all potential ties between pairs of nodes in N.

  2. 2.

    Randomly select a subset S from L.

  3. 3.

    From S, randomly select a subset \(S_A\), to make up the set of links of type A.

  4. 4.

    Remaining links in S along with o percent of links in \(S_A\) make up \(S_B\).

We follow these steps to generate random networks for each combination of size-ratio s and overlap-ratio o such that \(s \in \left\{ 1,2,3,4 \right\} \) and \(o \in \left\{ 0, 0.25, 0.5, 1\right\} \).

For each of our generated networks we then select a random subset of nodes to serve as seeds, \(5\%\) in each. We then simulate a one-period diffusion process in which we assume information can only diffuse through ties of type A (in other words, we assume that diffusion is tie-specific). Knowledge of the information by i, \(Y_i\), is a Bernoulli random variable taking value 1 with probability:

$$\begin{aligned} Pr(y_i = 1) = logit^{-1}(\beta _0 + X^A_i\beta _1) \end{aligned}$$

where \(X^A_i\) is the proportion of i’s neighbors that are seeds according to ties of type A. We set \(\beta _0 = -1\) and \(\beta _0 = 0.2\) and simulate 500 one-period diffusions for each size/overlap-ratio combination.Footnote 5 After each simulation we estimate the “noisy model”:

$$\begin{aligned} Pr(y_i = 1) = logit^{-1}(\beta _0 + X^{agg}_i\beta _1) \end{aligned}$$

where \(X^{agg}_i\) is the proportion of neighbors that are seeds according to the union of \(S_A\) and \(S_B\). In practice, adding noise to the covariate measure reduces its variance, thereby reducing its explanatory power. Table 1 provides evidence for this intuition. Each value corresponds to the ratio of the average standard deviation of \(X^{agg}_i\) and \(X^A_i\) specific to a size- and overlap-ratio combination. As the size-ratio (overlap-ratio) of \(S_A\) to \(S_B\) increases (decreases) the smaller the variance of \(X^{agg}_i\) relative to the variance of \(X^A_i\).

Table 1. Ratio of the mean standard deviations of \(X^{agg}_i\) and \(X^A_i\).

Now define the bias ratio as the ratio of the estimated \(\hat{\beta _1}\) and the true \(\beta _1\). An unbiased estimator applied to the correct model yields on average a bias-ratio of 1. Figure 2 plots the distribution of the bias ratio for varying levels of overlap-ratio, holding the size-ratio equal to 1 and 4 in the left and right sub-figure respectively. Both figures show how the estimated effect is increasingly attenuated as the overlap-ratio between the two networks decreases. Moreover, even in the presence of perfect overlap—formally \(S_A \subseteq S_B\)—, we still observe attenuation as the number of uninformative ties increases in proportion to informative ties (see Fig. 2b).

Fig. 2.
figure 2

Mean bias ratio as a function of size- and overlap-ratio.

As noted above, adding noise to the covariate measure also results in downward biased test statistics thereby increasing the probability of making type-II inferential errors. For each set of simulations -500 for each size-/overlap-ratio combination- we computed the proportion of regressions for which \(\beta _1\) is statistically significant at 0.05. Table 2 summarizes these results. It is clear that as the size-ratio increases and overlap-ratio decreases the proportion of regressions for which we correctly reject the null decreases.

Table 2. Proportion of simulations with significant \(\beta _1\) (\(95\%\) level)

6 Conclusion

Our article reexamines the standard approach of social network measurement, which defaults to taking the union of multiple types of social ties. This approach results in one aggregate “social network" that summarizes all information on social ties contained in the data. We point out that for many realistic cases, this logic is too simple, and can mask network effects that are in fact present.

We show that the decision of whether and how to aggregate ties should be informed by two factors: (1) whether the effect in question diffuses generically or in a tie-specific way, and (2) the extent of overlap between the measured networks. Aggregating ties when diffusion is tie-specific and overlap is low will negatively bias and potentially mask network effects that are in fact present.

Of course, knowing how the diffusion process should work is a substantial requirement. Our results highlight just how crucial strong theory and qualitative knowledge are for detecting meaningful network effects.