Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

An increasing number of today’s social interactions occur using online social media as communication channels. Some online social networks have become extremely popular in the last decade. They differ among themselves in the character of the service they provide to online users. For instance, Facebook can be seen mainly as a platform for keeping in touch with close friends and relatives, Twitter is used to propagate and receive news, LinkedIn facilitates the maintenance of professional contacts, and Flickr gathers amateurs and professionals of photography. Albeit different, all these online platforms share an ingredient that pervades all their applications. There exists an underlying social network that allows their users to keep in touch with each other and helps to engage them in common activities or interactions leading to a better fulfillment of the service’s purposes. This is the reason why these platforms share a good number of functionalities, e.g., personal communication channels, broadcasted status updates, easy one-step information sharing, and news feeds exposing broadcasted content. As a result, online social networks are an interesting field to study an online social behavior that seems to be generic among the different online services. Since at the bottom of these services lays a network of declared relations and the basic interactions in these platforms tend to be pairwise, a natural methodology for studying these systems is provided by network science. In this chapter, we describe some of the results of research studies on the structure, dynamics, and social activity in online social networks. We present them in the interdisciplinary context of network science, sociological studies, and computer science.

2 Structure of Social Networks

Social networks in general show a very rich internal structure [1] that in some aspects falls quite far from random graphs or even from artificial networks created by virtue of a preferential attachment mechanism. In this section we briefly review the most important features broadly found in social networks.

2.1 Degree Distribution

The most fundamental characteristic of a network is the distribution of degrees: a function that measures how many friends have the members of the network and what is the variability of this number among all the users. The degree distribution in social networks is usually broad. These distributions have been typically modeled as functions having a heavy tail such as a power-law or a lognormal combined with an exponential cutoff at large values of the number of friends [27]. This means that there is a large variability in the number of connections of the nodes, with many nodes having small or moderate number of friends and a small number of them maintaining large number of friends. Almost all users of online social networks are connected in a largest connected component [4, 7]. Some of the studies also point out that online social networks contain a densely connected core or cores [4, 5] consisting in groups of high-degree nodes that hold the network together. The existence of such cores provides paths for the connection between distinct parts of the network. A well-known aspect of the social networks is that the average shortest path distance is low [2, 4, 5, 7]. This characteristic is popularly known as six degrees of separation or small-world effect [8]. The importance of the shortcuts for reducing the network path length has been highlighted in [9].

2.2 Triangles and Community Structure

Possibly, the most important feature distinguishing social networks from other types of networks is their high level of clustering or transitivity [1, 2, 47, 9]. The clustering coefficient measures the probability that two nodes sharing a common neighbor (a node to which they are both connected) are connected. This property is quantified with a global clustering coefficient C [6] which is defined as

$$\displaystyle{ C = \frac{\text{number of closed connected triples}} {\text{number of connected triples}}, }$$
(1)

where a connected triple of nodes is a sequence of 3 nodes which have at least 2 connections between them and a closed triple is a triangle. One can also define a local clustering coefficient c i as

$$\displaystyle{ c_{i} = \frac{\text{number of closed triples centered on node i}} {\text{number of triples centered on node i}}. }$$
(2)

In this case a global value of clustering coefficient may be obtained averaging the local c i over all the nodes of the network ⟨c⟩. One should note that ⟨c⟩ is in general different from the coefficient C and that the latter has a much worse scaling behavior. At the structural level, a high clustering coefficient indicates the presence of many triangles in the network. At the social level, this means that friends of an individual tend to be connected between themselves too. This is a well-known phenomena in sociology which is important for the formation of strong social ties [10, 11] and affects the emergence of positive and negative relations [12]. At the network macroscopic level, a high density of triangles can be related to the existence of community structure in social networks [13]. Furthermore the study [14] suggests that in real networks with high value of clustering coefficient community structure emerges without any additional ingredients included.

Existence of communities in social networks is considered to have high relevance both by sociologist [11, 13] and network scientists [1517]. We give further arguments on this in the third section of this chapter. In online social networks, groups can be identified in several ways. One of them is searching for communities in the graph defined as more densely connected parts of the network compared with their neighborhood. This approach is usually taken in network science, and various community detection algorithms have been developed and continue to be under active development for detecting such clusters [15]. In addition, some online social networks allow their users to create explicit groups and to claim its membership. Although it seems straightforward to make use of such user-declared groups, one should be careful when interpreting them since incentives for creation of such groups may vary [18]. Nevertheless it has been found that the declared groups tend to have internally higher clustering coefficient [5] and therefore they may be correlated with the more densely connected parts of the network found by community detection algorithms.

2.3 Assortativity and Homophily

Another common feature of social networks is that connected users tend to be similar [19]. This effect is popularly known as birds of a feather flock together phenomena. It manifests itself in social networks through similarities in various properties of connected individuals. From pure network theory point of view the similarity may appear as a correlation of degrees between friends, which is called assortativity mixing, or as a rich-club effect [20]. In such assortative networks nodes of high degrees tend to be connected to other nodes of high degrees, and vice versa, nodes of low degree tend to be connected to other low-degree nodes. It has been found that offline social networks are assortative in contrast with networks of other types [1, 21]. However, this is not the only property in which friends are similar. This kind of phenomena is in general called homophily and is known to be present very broadly is social networks. People who are connected in online social networks tend to have similar age, live in close locations, and have similar interests [4, 7, 17, 22]. It is also considered that people who belong to the same community, namely, the same well-connected group of people, talk about similar topics, which can have an important impact on information and innovation diffusion in social networks [11, 23].

2.4 Differences Between Offline and Online Social Networks

As shown in the previous subsections many statistical properties of offline social networks are also found online. On the other hand creating links in an online social network is much less costly than developing offline social relations. These online connections can easily accumulate and pile up to large numbers [24]. If the number of connections increases to the millions, the amount of effort that a user can invest into a relation that each link represents must fall to near zero. An early illustration of the relevance of the definition of social tie in characterization of social networks was shown in the study of email networks: while the distribution of the number of contacts in address books is power law [25], it is exponential when the contacts are restricted to reciprocated emails [26]. Moreover disassortative mixing has been encountered in some online networks [2, 27] in contrast to the assortative mixing characteristic of offline social networks [6]. As a matter of fact there exists an open discussion on the validity of online interactions as indicators of real social activity [24, 2831]. In order to test the validity of online networks for social studies and to find its limitations, further investigation is needed. In this chapter we present some of the recent results of such studies.

3 Growth in Social Networks

3.1 Preferential Attachment

Many features of complex systems are characterized by heavy-tailed distributions [32, 33], e.g., frequency of words [34], the wealth of nations [35], and degree distribution of complex networks [36]. This property is typically perceived as a symptom of the rich-gets-richer principle, from which the so-called preferential growth models stem. The common concept of these models is that the elements of the system grow proportionally to their current size, what is referred to as preferential growth or preferential attachment rule. Typically, in these models, increments of the defining variables of the system occur in each time step. Such increments can involve the addition of new elements and/or to increase the sizes of the existing ones according to a preferential growth rule. Preferential models are usually the first approach to explain heavy-tailed distributions in many different systems [3740]. In the case of networks, this kind of models got popularized a decade ago [36, 4144]. The first of these models in the context of complex networks was introduced by Barabási and Albert in [41]. To describe it shortly, in each time step, one node is introduced into the system with m edges. These edges are connected to existing nodes in the system with probability proportional to the degree of the present nodes. As a result a network with heavy-tailed (usually power-law) distribution of node degrees emerges. The rule of Barabási-Albert model yields high simplicity, which is typically a desirable feature, but that can be too rigid in some cases. In preferential-growth models, the time unit is directly coupled to the number of new arriving elements, which can complicate the comparison of the dynamics of these models with real data. Some other drawbacks include the lack of heterogeneity and strong correlation between age of elements and their size [45]. Because of these issues the basic preferential growth model is typically used as a simple model for generating networks with a power-law degree distribution. On the other hand, it is also successfully used as a component of models trying to simulate growth of real social networks [46, 47].

3.2 Heterogeneity

In many real systems, especially in social systems, individuals or elements are very diverse. This factor is related to the heavy-tailed distributions that are so commonly found. In this direction, some models incorporating heterogeneity in the form of fitness, hidden variables, or ranking shuffling have been proposed [4852]. In general this family of models determines growth of elements with some kind of intrinsic property. Whereas in preferential attachment models, the growth is proportional to the current size of the elements; in fitness models, it is usually proportional to the intrinsic fitness of each element. Typically the fitness is a random variable specific for each element drawn from a given distribution. This introduces high heterogeneity among the elements. A number of empirical works shows how this intrinsic fitness is distributed and what is its role in complex system growth [5356]. We discuss in detail one of the models of this family in the next section when commenting on the growth of groups in online social networks.

3.3 Triadic Closure/Triangle Closing

Due to the fact that clustering coefficient is remarkably high in some networks (mainly social networks), other growth models have been introduced in order to reproduce high number of triangles in the network. One of the first models accounting for this was [9] in which regular network with initially high clustering had its connections rewired to make it more realistic and control clustering coefficient, as well as average shortest path length. A more sophisticated model used to simulate growth of social networks has been proposed in [47], and one of its main components is triangle closing. In this model new nodes appearing in the system connect to some node, usually using preferential attachment rule, and then start closing triangles with neighbors of this node. This simple triangle-closing mechanism exhibits much more realistic results in modeling online social networks [47].

3.4 Dynamics of Groups

As we have emphasized in the previous section the existence of communities plays an important role in functioning of social networks. In this section we present studies of the growth of such groups. Several aspects have been identified as positively influencing groups’ growth and their persistence. It has been suggested that high internal connectivity helps declared groups’ growth [57]. Other work argues that flexibility of big communities helps them stay alive longer, while small groups are more persistent if their composition stays unchanged [17].

From the macroscopic perspective growth of groups can be described and modeled using a version of preferential attachment model or a model with hidden variables/intrinsic heterogeneity. A comparison between these two approaches has been performed in [56] using real data from Flickr. The heterogeneous linear growth model suggested in this study assumes linear growth of groups with growth value (fitness) being drawn from heavy-tailed distribution (lognormal) and a number of new groups appearing in the systems growing linearly in time. As a comparison, a version of Simon model [37] has been used, which represents a model from preferential attachment family. As one can see in Fig. 1a, the average growth ⟨α | g⟩ for groups of given size g is proportional to the size of the groups for high g. This commonly is interpreted as the consequence of preferential attachment. However, as it is shown in Fig. 1a, one obtains similar dependence using the heterogeneous linear growth model. This is the case because the average growth is an average over all groups of a given size, each of them growing linearly. Due to the heterogeneity and the linear growth, at a given time, larger groups consist of old groups that grow slowly and younger groups that grow faster. Thus, the observation of preferential growth for groups of the same size does not reflect in this case an underlying rich-gets-richer principle, but it is a consequence of the competition of groups with different growth values and ages. Both the heterogeneous linear growth model and the Simon model produce heavy-tailed distribution of group sizes. However, the former model performs better in other respects. First, in the Simon model the final size of groups is heavily determined by their initial size measured one year before (Fig. 1b); thus, there is little heterogeneity among the groups, in contrast to the heterogeneous linear growth model which displays a degree of heterogeneity similar to the one of real groups. Second, for the Simon model the correlation of size and age is strong, while it is weak for real groups and the heterogeneous linear growth model (Fig. 1c–e). In the heterogeneous linear growth model the heavy-tailed distribution of final sizes of elements does not emerge from the growth process itself (e.g., rich-gets-richer principle), but from the intrinsic heterogeneity of elements which take part in this growth process. This certainly does not answer the question why some groups grow faster than the others, as we do not understand yet what factors influence the fitness of the groups. However it points that it does not have to be due to the fact that one group is bigger than the other as in preferential attachment models. The simplicity of this approach suggests that the characterization of the heterogeneity may play an important role in understanding the origin of broad distributions and the time evolution of many real systems.

Fig. 1
figure 1

Growth of groups in Flickr. (a) Average daily growth as a function of the initial size of the groups, estimated for the period of 6 weeks and averaged over 260,000 groups of a given initial size, for the real data from Flickr (circles), the heterogeneous linear growth model (triangles). The dashed line corresponds to the linear behavior ⟨α | g⟩ ∼ g. (b) Initial and final group sizes over a period of 350 days for the real data (circles), the heterogeneous linear growth model (filled triangles) and Simon model (diamonds). Each point represents a single group, there are 9,503 points plotted for each set of points. (c–e) Box plots with whiskers at 9th/91st percentile of final size of groups as a function of their age at the time of the measurement for 260,000 groups for (c) the real data, (d) the heterogeneous linear growth model, and (e) the Simon model (Adapted from [56])

4 Activity in Online Social Networks

In general a social network is a broad term, and it refers to a set of actors and a set of ties between them representing some kind of relation or interaction. In fact, however, there are many types of both relations and interactions, and usually they happen on top of each other. So far we mostly discussed social networks which represent a particular relation or interaction, e.g., coappearance in movies, boards of directors or coauthorship [1, 6, 21], network of online friendship [2, 5, 7, 47], and network of communication [4, 17, 58]. In online social networks, we can relate user activity with their declared relations with other users. In other words, one can relate pairwise (rarely one-to-many) interactions of users with their declared social network. We describe the studies which tackle this issue in this section.

4.1 Activity Networks Vs. Declared Social Network

The comparison of the network built from declared online friends and the network built from user interactions shows several differences at the structural level. First of all, the actors tend to interact with much less people than they declare as friends, what results in smaller degrees of nodes in the interactions network [59, 60]. Moreover, the friends they interact with change rapidly, and only about 30% of pairwise interactions in one month continue over the next month [59]. Due to the fact that the degrees are lower, the properties related to small-world effect are also less evident, namely, average path lengths are higher [60] and there is less densely connected cores [61].

4.2 Theories on Social Ties and Information Diffusion

The theory known as the strength of weak ties, proposed by Granovetter [11], deals with the relation between structure, intensity of social ties, and diffusion of information in offline social networks. On one hand, a tie can be characterized by its strength, which is related to the time spent together, intimacy, and emotional intensity of a relation. Strong ties refer to relations with close friends or relatives, while weak ties represent links with distant acquaintances. On the other hand, a tie can be characterized by its position in the network. Social networks are usually composed of communities. A tie can thus be internal to a group or a bridge between groups, as in Fig. 2. Granovetter’s theory predicts that weak ties act as bridges between groups and are important for the diffusion of new information across the network. Strong ties are predicted to be located at the interior of the groups between actors who have many friends in common. Burt’s work [13] emphasizes the advantage of connecting different groups to access novel information due to the diversity in the sources.

Fig. 2
figure 2

Different types of links depending on their position with respect to the groups’ structure: internal, between groups, intermediary links, and no-group links (Adapted from [16])

Furthermore, more recent works point out that information propagation may be dependent on the type of content transmitted [6264] and on a diversity-bandwidth trade-off  [65]. The bandwidth of a tie is defined as the rate of information transmission per unit of time. Aral et al. [65] note that weak ties interact infrequently and, therefore, have low bandwidth, whereas strong ties interact more often and have high bandwidth. The authors claim that both diversity and bandwidth are relevant for the diffusion of novel information. Since these are anticorrelated, there has to be a trade-off to reach an optimal point in the propagation of new information. They also suggest that strong ties may be important to propagate information depending on the structural diversity, the number of topics, and the dynamic of the information. Due to the different nature of online and offline interactions, it is not clear whether online networks organize following the sociological theories. In the following subsection we present results of some works testing if these theories apply to online social networks.

4.3 Testing Social Theories in Online Social Networks

The predictions of the theory of the strength of weak ties have been checked in a mobile phone calls dataset [58] and, very recently, in online social networks [16, 66, 67]. Different predictors have been considered to estimate social tie strength [68] including, for instance, time spent together [68], the duration of phone calls [58], or number of messages exchanged [16, 66]. The two works [16, 58] have measured the dependence of strength of a tie on number of common friends shared by the two actors, showing that the more friends they share, the more likely it is that the tie is strong. This stays in agreement with homophily effect in social network described at the beginning of this chapter. Many shared friends of a pair of users coupled by a strong tie can be interpreted as high homophily between them in terms of acquired friends. Furthermore, large field experiment performed at Facebook [66] has isolated the effect of homophily and social impact on the probability of propagation of information in online social network. The study has shown that users are around 7 times more likely to rebroadcast a piece of information published by their friends if they are exposed to it, which is interpreted as 7 times higher chance of information propagation due to social influence than to homophily. Moreover, the work argues that the weaker is the tie for which information propagation is considered, the higher is the likelihood of information flow due to social influence. This corresponds to Granovetter’s prediction that weak ties are important for information diffusion. In the following paragraphs we describe in more detail findings of a similar study in Twitter [16], a popular social microblogging platform.

Online networks are promising for social studies due to the wide availability of data and the fact that different types of interactions are explicitly separated, e.g., information diffusion events are distinguished from more personal communications. Diffusion events are implemented as a system option in the form of share or repost buttons with which it is enough to single-click on a piece of information to rebroadcast it to all the users’ contacts. This is in contrast to personal communications for which more effort has to be invested to write a short message and to select the recipient(s). In Twitter such actions are called, respectively, retweet [69] and mention/reply [70]. The more mentions have been exchanged between two users, even more so if reciprocated, the stronger we consider the tie between them. On the other hand declared network does also exist in Twitter and is made of directed follower links. One, using clustering algorithms, can find communities of more densely connected users in such network. Specifically, in the study which we present, various clustering algorithms have been used (as shown in Supporting Information of [16]), and for brevity, we will focus only on results for OSLOM [71]. Granovetter’s theory predicts that social ties should occur more often inside communities. This is what happens for links with mentions. We define the fraction f as the ratio between the number of links with specified type of interaction in a given position with respect to the groups of corresponding size and the total number of links with that interaction. The fraction f reveals an interesting pattern as function of the group size as can be seen in Fig. 3a. Links with mentions are more abundant inside communities than any other links. This effect is especially significant for groups of sizes from 10 to 150 members. In addition, the distribution of the number of times that a link is used (intensity) for mentions is wide, which allows for a systematic study of the dependence of intensity and position (see Fig. 3b). It turns out that the more intense (or reciprocated) a link with mentions is, the more likely it becomes to find this link as internal (Fig. 3c). This corresponds to Granovetter’s expectation that the stronger the tie is, the higher the number of mutual contacts of both parties it has and the higher the chance that the parties belong to the same group.

Fig. 3
figure 3

Internal activity in Twitter. (a) Fraction f of internal links as a function of the group size in number of users. The curve for the follower network acts as baseline (black) for mentions (red) and retweets (green). Note that if mentions/retweets were randomly appearing over follower links, then the red/green curve should collapse with the black curve. Inset: links with mentions divided by the baseline (red) and links with retweets divided by the baseline (green). (b) Distribution of the number of mentions per link. (c) Fraction of links with mentions as a function of their intensity. The dashed curves are the total for the follower network (black) and for the links with mentions (red). While the other curves correspond (from bottom to top) to fractions of links with: 1 non-reciprocated mention (diamonds), 3 mentions (circles), 6 mentions (triangle up), and more than 6 reciprocated mentions (triangle down) (From [16])

The communication between groups can take place in two ways: the information can propagate by means of links between groups or by passing through an intermediary user belonging to more than one group; see Fig. 2. We have defined as intermediary the links connecting a pair of users sharing a common group and with at least one of the users belonging also to a different group. In order to estimate the efficiency of the different types of links as attractors of mentions and retweets, there was measured a ratio r defined as the number of links with specified interaction in a given position divided by the total number of links in that position was measured. The bar plot with the values of r is displayed in Fig. 4. The efficiency of the different type of links can thus be compared for the attraction of mentions (red bars) and retweets (green bars). Links internal to the groups attract more mentions and less retweets than links between groups in agreement with the predictions of the strength of weak ties theory. Intermediary links attract mentions as likely as internal links: the ratio of intermediary links with mentions is very close to the ratio of internal links with mentions. This is expected because intermediary links are also internal to the groups. However, the aspect that differentiates more intermediary links from other type of links is the way that they attract retweets. Intermediary links bear retweets with a higher likelihood than either internal or between-groups connections (see Fig. 4a). This fact can be interpreted within the framework of the trade-off between diversity and bandwidth [65]: strong ties are expected to be internal to the groups and to have high bandwidth, while ties connecting diverse environments or groups are more likely to propagate new information. High-bandwidth links in our case correspond to those with multiple mentions, while links providing large diversity are the ones between groups. Intermediary links exhibit these two features: they are internal to the groups and statistically bear more mentions and introduce diversity through the intermediary user membership in several groups. Moreover, in line with the theories [11, 13, 65], higher diversity increases the chances for a link to bear retweets as can be seen in Fig. 4b, which implies a more efficient information flow. In the inset of the figure it is shown that the number of non-shared groups assigned to the users connected by the link positively correlates with, up to twice higher than expected, number of retweets.

Fig. 4
figure 4

Intermediary links. (a) Ratio r between the number of links with mentions or retweets and number of follower links. (b) Distribution of the links in the follower network (black curve), those with mentions (red curve) and retweets (green curve) as a function of the number of non-shared groups of the users connected by the link. Inset, ratios between these distributions and the follower network (From [16])

5 Summary

Research on online social networks is a rich and an active field of study. The availability of large amount of data allows for studies of both dynamics of social networks and user–user activity on the social network connections. Different growth models have been proposed to simulate the growth of the network, among which three main families are preferential growth models, fitness or hidden variable models, and triangle-closing models. The latter model is reported to yield most accurate results; however, it also incorporates mechanism of preferential attachment. The main advantage of triangle-closing model is that it directly produces network with enough clustering, which is reported to be a feature of social networks. Moreover, there are still open questions about the origin of these mechanisms and of some other phenomena observed during the growth process such as network densification [72]. While declared social network evolves, different types of interactions occur among its members, mostly among users already connected in the declared social network.

Recent studies have shown that different types of interactions happen according to the patterns predicted by the sociological theories. In general strong ties, which in online social networks are usually defined as the links with many messages exchanged between the pair of users, happen more often between users who have many friends in common or who belong to the same communities. On the other hand, weak ties appear more often between users who do not share friends and belong to different groups. It has been shown that weak ties are more efficient for the information spreading than strong ties. Closer study shows that trade-off between diversity and bandwidth may be crucial for diffusion of information.

In conclusion dynamics and activity in online social networks is remarkably rich and tells us much about our social behavior and confirms some of the known offline social theories. We expect that this field of research will be active and developing in the following years and that numerous further online observations and experiments will be undertaken to better understand and quantitatively describe social behaviors.