Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A broad range of systems are made of elements in interaction and can be represented as networks. Important examples include social networks, the Internet, airline routes, and a wide range of biological networks. The study of networks has emerged in the last decade as one of the fundamental building blocks in the wider study of complex systems [8, 15, 39]. Complex network theory emphasizes that the interactions between the individual components of the system are primordial to understand global emergent behavior and give the possibility to analyze systems of a very different nature within a single framework. Despite the fact that nodes and links have a different nature and are driven by different mechanisms for, e.g., the Web and food webs, the identification of similar structures in various complex systems suggests the existence of generic organization principles. A good example is the omnipresent multi-scale modular organization of complex networks, namely, the fact that they are made of modules, also called communities [62]. The precise mathematical definition of what communities are is still the subject of active research, but most definitions agree on the fact that modules, also called communities, are defined as subnetworks that are locally dense even though the network as a whole is sparse [20, 49]. In many complex systems it seems that modularity does not exist only at a single organizational scale but rather that each module can be further partitioned into a set of sub-modules and within each sub-module there may be sub-sub-modules, etc. In other words, many systems have the fractal property of multi-scale modularity. The detection of such community structure can be of importance for the understanding of the interplay between the structural, dynamical, and functional features of the network. This identification also has the advantage of providing a coarse-grained representation of the system, thereby allowing to sketch its organization and to identify sets of nodes that are likely to have hidden functions or properties in common. The main purpose of this chapter is to present methods uncovering the multi-scale organization of complex networks. In particular, we will emphasize methods finding modules based on the exploration of the network by a random walker at different time scales. The diffusive process allows for multistep transitions exploring further afield the structure of the graph, which results in the detection of community structure across scales, from finer to coarser. Time thus acts as a resolution parameter, allowing to zoom in and out to uncover the multi-scale structure of the system. As we will argue, this dynamic approach has the further advantages of taking into account the constraints imposed by the network structure on flows and of providing a unifying framework for a broad range of community detection methods.

2 Communities: Why Communities?

For many years, researchers, intrigued by the apparent modularity of many social, technological, and biological systems, have searched for mechanisms driving the evolution of systems towards a modular architecture [38]. One of the earliest and most influential ideas was formulated by Simon [62, 63] who argued that a nearly decomposable system, namely, a system of sparsely interconnected modules, allows faster adaptation of the system in response to changing external conditions. Modular systems can evolve by evolution in one module at a time or by duplication and mutation of modules. Well-adapted modules thus represent stable intermediate states whose stability is not jeopardized by evolution in other modules. This robustness represents a major advantage for any system evolving under fluctuating selection criteria, and this may explain the general prevalence of modular architectures across a very wide range of systems. This idea that a hierarchically modular design will be more rapidly and robustly assembled is illustrated by Simon [62] in a parable about two watchmakers, called Hora and Tempus:

The watches the men made consisted of about 1,000 parts each. Tempus had so constructed his that if he had one partly assembled and had to put it down to answer the phone say it immediately fell to pieces and had to be reassembled from the elements. The better the customers liked his watches, the more they phoned him, the more difficult it became for him to find enough uninterrupted time to finish a watch.

The watches that Hora made were no less complex than those of Tempus. But he had designed them so that he could put together subassemblies of about ten elements each. Ten of these subassemblies, again, could be put together into a larger subassembly; and a system of ten of the latter subassemblies constituted the whole watch. Hence, when Hora had to put down a partly assembled watch in order to answer the phone, he lost only a small part of his work, and he assembled his watches in only a fraction of the man-hours it took Tempus.

This idea has since been explored further. For instance, in [25], it has been argued that modular networks are optimal for performing tasks in a changing environment. In situations where different tasks share basic sub-functions, evolutionary pressure produces networks where modules specialize in these sub-functions and where each of the tasks is obtained by a rapid recombination of these building functions. Other types of argument have also been put forward, including:

  • Modular network topology is associated with a rich nonlinear dynamical behavior. Modular networks tend to produce time-scale separation, i.e., fast intra-modular processes and slow inter-modular processes [3, 45, 46], which helps at preserving coexistence and diversity in the system [27] and at finding a balance between segregated and integrated activity [47, 59]. The high dynamical complexity [64] leads to complex dynamical states such as transient chimera states [1, 60] where synchronization and desynchronization coexist across the network. Hierarchical modularity also enhances dynamical reconnectability [51], as marginally stable networks can be combined or divided while preserving stability.

  • Another plausible mechanism for the formation of modules is coevolution, as the network structure and function coevolve [22]. A broad range of models with adaptive rewiring typically incorporate a reinforcement of links between synchronized units and a pruning of links between asynchronized ones. This feedback between structure and dynamics, similar to synaptic plasticity in neuronal dynamics, naturally drives the emergence of inhomogeneities and modules in networks [31].

  • Modular networks have the property of small-worldness which is advantageous in a broad range of systems by combining high clustering and global connectivity [70]. For instance, in social systems, small-worldness balances the relative benefits of social cohesion and brokerage [30]. In neuroscience, the high clustering of connections favors locally segregated processing (with low wiring cost) of specialized functions, while the short path length supports globally integrated processing of more generic functions [65].

3 Combinatorial Approaches to Community Detection

3.1 Community Detection

Community detection aims at uncovering the modular organization of large-scale networks in an automatic fashion [20, 35, 49]. Most community detection methods find a partition of the nodes into communities, where most of the links are concentrated within the communities. Each node is assigned to one and only one community, i.e., partitions are not compatible with overlapping communities [2, 16, 44]. At the heart of a partitioning method, there is a mathematical definition for the quality of any partition. Once a quality function has been defined, different types of heuristics can be used in order to find, approximatively, its optimal partition, i.e., to find the partition having the highest value of the quality function. In a majority of cases, this quality function is based on the number of links within and/or across the communities, i.e., this is a combinatorial approach where quality is measured by counting certain motifs in the graph.

3.1.1 Modularity and Its Limitations

The most popular quality function for community detection is Newman–Girvan modularity, which we will describe in detail in this section. In the following, we focus on unweighted networks for the sake of clarity, but the results are directly applicable to weighted networks. Let A be the adjacency matrix of the network, where A ij determines the presence of a link going from j to i. The in-degree and out-degree of node i are defined as k i in ≡  j A ij and k i out ≡  j A ji , respectively; L ≡  i, j A ij is the total number of links in the network. If the network is undirected, the adjacency matrix is symmetric A ij  = A ji , \(k_{i}^{\mathrm{in}} = k_{i}^{\mathrm{out}} = k_{i}\), and the number of undirected links \(m = L/2\).

Modularity is a function of the adjacency matrix A and of the partition \(\mathcal{P}\) of the nodes into communities. It measures if links are more abundant within communities than would be expected on the basis of chance

$$\displaystyle{ Q = \mbox{ (fraction of links within communities)}-\mbox{ (expected fraction of such links)} }$$
(1)

and reads

$$\displaystyle{ Q ={ 1 \over L} \sum _{C\in \mathcal{P}}\sum _{i,j\in C}{\biggl [A_{ij} - P_{ij}\biggr ]} }$$
(2)

where i, j ∈ C is a summation over pairs of nodes i and j belonging to the same community C of \(\mathcal{P}\) and therefore counts intra-community links. The null hypothesis is an extra ingredient in the definition and is incorporated in the matrix P ij . P ij is the expected number of links between nodes i and j over an ensemble of random networks with certain constraints. These constraints correspond to known information about the network organization, i.e., its total number of links and nodes, which has to be taken into account when assessing the relevance of an observed topological feature.

Let us first consider undirected networks. Two standard choices of null model are

$$\displaystyle\begin{array}{rcl} P_{ij} =\langle {k\rangle }^{2}/2m,& \text{then}Q \equiv Q_{\mathrm{ unif}}&\end{array}$$
(3)

where \(\langle k\rangle = 2m/N\) is the average degree and the only constraint is thus the total number of links in the network, and

$$\displaystyle\begin{array}{rcl} P_{ij} = k_{i}k_{j}/2m,& \text{ then}Q \equiv Q_{\mathrm{conf}}.&\end{array}$$
(4)

where randomized networks now preserve the degree of each node. The latter null model is usually preferred because it takes into account the degree heterogeneity of the network [42]. More complicated null models can in principle be constructed in order to preserve other properties of the network under consideration [5, 43]. For instance, in the case of directed networks, it is common to use the null model

$$\displaystyle\begin{array}{rcl} P_{ij} = k_{i}^{in}k_{ j}^{out}/L,& \text{ then}Q \equiv Q_{\mathrm{ dir}}.&\end{array}$$
(5)

preserving the in- and out-degrees of each node.

In the case of undirected networks, again, it is interesting to note that Q unif and Q conf are naturally related to the combinatorial Laplacian \(L_{ij}^{(C)} = A_{ij} - k_{i}\delta _{ij}\) and the (normalized) Laplacian \(L_{ij} = A_{ij}/k_{j} -\delta _{ij}\), respectively,Footnote 1 and, more generally, to the dynamics induced by these operators, as we will discuss more in detail in Sect. 4. For Q conf, this relation is particularly clear after expressing modularity in terms of the (right) eigenvectors v α of L ij , i.e., v α satisfy j L ij v α, j  = λ α v α, i . Without loss of generality, we assume that λ 1 > λ 2 ≥  ≥ λ α  ≥  ≥ λ N . The dominant eigenvector v 1 of eigenvalue λ 1 = 0 is given by \(v_{1;i} = k_{i}/2m\) and is unique if the network is connected. By using a spectral decomposition of L ij , one finds [13]

$$\displaystyle\begin{array}{rcl} Q_{\mathrm{conf}} =\sum _{ \alpha =2}^{N}\frac{\lambda _{\alpha } + 1} {2m} \sum _{C}\sum _{i,j\in C}v_{\alpha ;i}v_{\alpha ;j}& &\end{array}$$
(6)

where the contribution of the dominant eigenvector v 1 and the null model have cancelled each other out.

Modularity has become an essential element of a large number of clustering methods for large-scale networks. These methods aim at optimizing the modularity of a graph, i.e., finding the partition having the maximal value of Q. An exhaustive optimization of Q is impossible because of the explosion in the number of ways to partition a graph, when its size increases. It has been shown that the optimization of modularity is an NP-complete problem [9]. For this reason, several heuristics have been proposed to find high-quality partitions [7, 11, 20, 42, 48]. The optimization of modularity has the advantage of being performed without a priori specifying the number of modules nor their size. This procedure has been shown to produce useful and relevant partitions in a number of systems [41]. Unfortunately, it has also been shown that modularity suffers from several limitations [19, 23], partly because modularity optimization produces one single partition, which is not satisfactory when dealing with multi-scale systems. Related to this issue, there is the so-called resolution limit of modularity [19], namely, the fact that modularity is blind to modules smaller than a certain scale. This point originates from the bias of modularity towards modules having a certain scale which might not be compatible with the system architecture [66]. This incompatibility also makes modularity inefficient in practical contexts as it may lead to a high degeneracy of its landscape [21], i.e., the existence of several distinct partitions having a modularity close to the optimum, which implies that approximate solutions of the optimization problem are very dissimilar and that a partition derived from modularity optimization has to be considered with caution.

3.1.2 Multi-scale Methods

Different methods have been proposed to go beyond modularity optimization. A first set of methods looks for local maxima of the modularity landscape in order to uncover partitions at different scales [56]. A good example is the so-called Louvain method, which is a greedy method taking advantage of the hierarchical organization of complex networks in order to facilitate the optimization of modularity [7]. This heuristic performs the optimization in a multi-scale way: by comparing the communities first of adjacent nodes, then of adjacent groups of nodes found in the first round, etc. It has been shown in several examples that modularity estimated by this method is close to the optimal value obtained with slower methods but also that intermediate partitions are meaningful and correspond to communities at intermediate resolutions [37]. This approach has the advantage of being fast, but it lacks theoretical foundations and is not able to uncover coarser partitions than those obtained by modularity optimization. Moreover, it may produce hierarchies even when the system is single scale or, worse, completely random [23] (see [37] for a discussion of how to deal with this issue).

Another class of methods is based on multi-scale quality functions. These quality functions incorporate a resolution parameter allowing to tune the characteristic size of the modules in the optimal partition and aim at uncovering modules at the true scale of organization of a network, i.e., not at a scale imposed by modularity optimization. The two most popular multi-scale quality functions are ad hoc, parametric generalizations of modularity. A first quantity is the parametric modularity introduced by Reichardt and Bornholdt [50]:

$$\displaystyle{ Q_{\gamma } ={ 1 \over 2m} \sum _{C\in \mathcal{P}}\sum _{i,j\in C}{\biggl [A_{ij} -\gamma P_{ij}\biggr ]}, }$$
(7)

which is usually defined for the configuration null model \(P_{ij} = k_{i}k_{j}/2m\) and mainly consists in changing the effective size of the system \(m_{\mathrm{eff}} = m/\gamma\). The optimization of Q γ leads to larger and larger communities in the optimal partition when γ is decreased. This approach makes use of the size dependence of modularity: because of the factor 1 ∕ 2m in the null model, modularity depends on the total size of the network and not only on its local properties.Footnote 2 Decreasing m eff (increasing γ) increases the expected number of links γP ij between i and j, which makes it less advantageous to assign i and j to the same community (because A ij  − γP ij decreases).

An alternative approach proposed by Arenas et al. [4] keeps modularity unchanged but modifies the network by adding self-loops to the original network. This approach therefore consists in optimizing

$$\displaystyle{ Q_{r} = Q(A_{ij} + rI_{ij}). }$$
(8)

As expected, increasing r has a tendency to decrease the size of the communities and the optimal partition of Q is made of single nodes. Even if increasing γ and r has, qualitatively, the same effect on the characteristic size of the communities, one should keep in mind that Q γ and Q r are in general optimized by different partitions, except if the network is regular and the resolution parameters verify \(\gamma = 1 + r/\langle k\rangle\). It is also interesting to note that the quality function Eq. (8) was first proposed in order to preserve the eigenvectors of the adjacency matrix, as the eigenvectors of A ij  + rI ij and A ij are obviously the same. From a partitioning viewpoint, however, the eigenvectors of A ij do not matter as much as the eigenvectors of the combinatorial Laplacian L ij (C) [18] and the normalized Laplacian L ij  [61]. Moreover, modularity is related to the eigenvectors of the Laplacian and not of the adjacency matrix; see Eq. (6). These observations suggest to adapt the unfitting quality function Eq. (8) and to optimize the modularity of a modified adjacency matrix preserving the eigenvectors of L ij . This can readily be done by adding degree-dependent self-loops to the nodes

$$\displaystyle{ A_{ij}^{^{\prime}} = A_{ ij} + r\frac{k_{i}} {\langle k\rangle } \delta _{ij}, }$$
(9)

and by optimizing the quality function

$$\displaystyle{ Q_{r}^{^{\prime}} \equiv Q(A_{ ij} + r\frac{k_{i}} {\langle k\rangle } \delta _{ij}). }$$
(10)

This quality function is equivalent, up to a linear transformation, to Q γ for any network, i.e., not only for regular networks, with \(\gamma = 1 + r/\langle k\rangle\), thereby providing two alternative interpretations to resolution parameters.

Before closing this section, we should emphasize that many other types of methods have also been proposed to detect the multi-scale modular organization of networks, e.g., the hierarchical extension of the Map formalism [55] whose single-scale version is described below, local algorithms [34], or the modeling of the system by hierarchical random graphs [10].

4 Communities: Dynamics and Function

4.1 Linear Dynamics

The behavior of complex systems is determined not only by the topological organization of their interconnections but also by the dynamical processes taking place among their constituents [6]. A faithful modeling of the dynamics is essential because different dynamical processes may be affected very differently by network topology. A full characterization of such systems thus requires a formalization that encompasses both aspects simultaneously, rather than relying only on the topological adjacency matrix [32]. In the simple case of linear dynamics alone, a broad range of qualitatively different processes can be defined on the same graph, e.g., with the same adjacency matrix A ij . Let us consider the class of linear processes defined by the equation

$$\displaystyle{ x_{i;t+\tau } =\sum _{j}B_{ij}x_{j;t} }$$
(11)

where the evolution of a quantity x i , associated to node i, is driven by B ij , a matrix somehow related to the adjacency matrix A ij .

In the subclass of random walk processes alone, modeling the diffusion of some quantity or information between nodes, Eq. (11) takes the form

$$\displaystyle{ p_{i;t+\tau } =\sum _{j}T_{ij}p_{j;t} }$$
(12)

where p j; t is the probability to observe a walker on node i at time t and where T is the transition matrix whose entry T ij represents the probability to jump from j to i in a time interval of duration τ. T ij is a nonnegative matrix verifying the condition i T ij  = 1 to ensure the conservation of probability. Except for those constraints, it is arbitrarily defined on a given graph. Important cases include:

  • Discrete-time, unbiased random walk, where

    $$\displaystyle{ T_{ij} = A_{ij}/k_{j}^{out} }$$
    (13)
  • Biased random walks, where

    $$\displaystyle{ T_{ij}^{(\alpha )} = \frac{\alpha _{i}A_{ij}} {\sum _{k}\alpha _{k}A_{kj}}, }$$
    (14)

    and where α is an attribute biasing the motion of random walkers towards certain nodes.

  • Continuous-time random walks, where walkers perform their jumps asynchronously. In the case of exponentially distributed waiting times between jumps, the transition matrix is

    $$\displaystyle{ T_{ij} = \left ({e}^{-\tau L}\right )_{ ij} }$$
    (15)

    and relation (12) can be seen as the formal solution of the rate equation

    $$\displaystyle{ \dot{p}_{i} =\sum _{j}\left ( \frac{A_{ij}} {k_{j}^{out}} -\delta _{ij}\right )p_{j} \equiv -\sum _{j}L_{ij}p_{j}. }$$
    (16)

    A Taylor expansion of (15) in terms of τ clearly shows that the transition matrix accounts for paths of any length on the graph, i.e., each path of length k contributes proportionally to the probability for a walker to perform k jumps in a time interval τ.

It is interesting to note that to each random walk process corresponds a consensus process, in which nodes imitate their neighbors such as to reach a coordinated behavior. In its simplest form, consensus is implemented by the so-called agreement algorithm [67]. Each node i is endowed with a scalar value x i which evolves as

$$\displaystyle{ x_{i;t+\tau } = \frac{1} {k_{i}^{in}}\sum _{j}A_{ij}x_{j;t}. }$$
(17)

The matrix driving the dynamics now verifies the constraint \(\frac{1} {k_{i}^{in}}\sum _{j}A_{ij} = 1\), i.e., the value on a node is updated by computing a weighted average of the values on its neighbors. Similarly to the case of random walks, the process (17) can be generalized either by introducing a bias in the weighted average or by introducing a rate at which nodes update their state.

4.2 Flow-Based Approaches

The multi-resolution quality functions defined in Sect. 3 have been successfully tested on multi-scale benchmark and empirical networks [17, 50, 52]. They have the further advantage of being mathematically very similar to modularity and of being optimized by modularity optimization algorithms with minimum code development. Unfortunately, the introduction of a resolution parameter, γ or r, feels like a trick and lacks theoretical ground. In order to define a resolution parameter in a more satisfying way and, as we will see, to provide a more solid foundation to Q γ and Q r , we look at communities from a different angle; instead from a combinatorial point of view, where intra-community links are counted as in (2), we investigate from a dynamical point of view accounting for the interplay between structure and dynamics. In the following, we describe two such quality functions and emphasize how their dynamical nature helps at clarifying the concept of resolution parameter and at solving some of the issues of the aforementioned combinatorial quality functions. In each case, the starting point is the following: a flow taking place on a network is expected to be trapped for long times in good communities before being able to escape [13, 53, 68]. This argument suggests to measure the quality of a partition in terms of the persistence of flows of random walkers on the network.

From now on, we consider a random walk process at equilibrium. By doing so, we assume that the process is ergodic, i.e., any initial configuration asymptotically reaches the unique stationary solution. If this is not the case, standard tricks, e.g., teleportation, can be introduced to make the dynamics ergodic [33].

4.2.1 Stability

A first quality function, called stability [13], consists in estimating the probability that a random walker stays in a community during a certain time interval

$$\begin{array}{rlrlrl} R(\tau ) & = \mbox{ (probability for a random walker to be in the } & & \\ &\mbox{ same community initially and at time}\tau ) & & \\ & -\mbox{ (probability for two independent random } & & \\ &\mbox{ walkers to be in the same community)} &\end{array}$$
(18)

when the system is at equilibrium. To clarify this concept, let us focus on the continuous-time random walk Eq. (15) on an undirected network.Footnote 3 By definition, the corresponding stability of a partition is

$$\displaystyle{ R(\tau ) =\sum _{C}\sum _{i,j\in C}{\biggl [\left ({e}^{-\tau L}\right )_{ ij}\pi _{j} -\pi _{i}\pi _{j}\biggr ]}, }$$
(19)

where the probability to find a walker on node i at equilibrium is \(\pi _{i} = k_{i}/2m\) because of the undirectedness of the links. This expression clearly shows that stability depends on time. The quality of a partition is thus measured differently at different time scales and is, in general, maximized by different partitions when time is tuned, thereby leading to a sequence of optimal partitions.

By looking at limiting values of τ, one can show that time acts as a resolution parameter [13, 28]. As time grows, the characteristic size of the communities is thus adjusted to reveal the possible multi-scale organization of the system. In the limit τ → 0, keeping linear terms in t in the expansion of R(τ) leads to

$$\displaystyle{ R(\tau ) \approx (1-\tau )\,R(0) +\tau \, Q_{\mathrm{conf}} \equiv Q(t), }$$
(20)

which is equivalent up to a linear transformation to Q γ and Q r when \(P_{ij} = k_{i}k_{j}/2m\) (with \(\tau = 1/\gamma\), \(\tau =\langle k\rangle /(r +\langle k\rangle )\)). The latter multi-resolution quality functions can therefore be seen as a simple linear approximation of R(τ), which provides a physical interpretation to the resolution parameter r and γ, i.e., the inverse of the time used to explore the network. It is also interesting to note that the configuration null model naturally emerges from the definition of stability and from the dynamics (15). Interestingly, other null models, including the uniform null model, are associated to other random walk processes [28]. We should also stress that this connection with modularity only exists for undirected networks and that stability is radically different from modularity when the network is directed. This difference stems from the fact that detailed balance is not verified at equilibrium for directed networks. In the limit τ → , making use of the spectral decomposition of L, stability simplifies as

$$\displaystyle{ R(\tau ) \approx \frac{1} {2m}{e}^{\tau \lambda _{2}}\sum _{C}\sum _{i,j\in C}v_{2;i}v_{2;j} }$$
(21)

where it is assumed that the second dominant eigenvalue λ 2 of L is not degenerate and v 2 is its corresponding (right) eigenvector. R(τ) is therefore maximized by a partition into two communities in accordance with the normalized Fiedler eigenvector [61].

4.2.2 The Map Equation

An alternative way to measure the trapping of walkers in communities is to adopt a coding perspective and to search for a compact description of trajectories on a network in terms of its communities. To do so, the Map equation method [53, 54] relies on a compression of the description length of a random walk inside and between communities. The underlying principle is that for a strongly modular network, the code for the transitions of the random walker can be efficiently compressed by capitalizing on the presence of the community structure. In this formalism, the movement of the walker is described in terms of two features: first, within each community the movement is encoded assigning a unique code word for each node and a particular exit code word for the community. These are stored in an index codebook that is specific to that community. Second, there is intercommunity codebook with unique code words that describe the movements between different communities. The argument is that a walker will rarely leave a good community and a strong community structure leads to a reduction of the code length since short code words can be reused within each community codebook.

Let us focus again on a random walk process in equilibrium, and consider a partition of the network into communities. The probability of leaving a particular community C at equilibrium is denoted by q C↷  =  i ∈ C jC T ji π i , with \(\pi _{i} = k_{i}/2m\) if the graph is undirected. The Map equation for this partition is

$$\displaystyle{ L(\text{M}) = q_{\curvearrowright }H(\mathcal{Q}) +\sum _{C}p_{\circlearrowright }^{C}H({\mathcal{C}}^{C}), }$$
(22)

where H is the Shannon entropy. The first term of this equation is the weighted entropy associated with the intercommunity movement of the random walker, where the weighting factor q  =  C q C↷ is the overall probability of leaving a community. For the Map coding scheme \(H(\mathcal{Q})\) is the minimal average per-step code length to describe the transition of the walker between different communities

$$\displaystyle{ H(\mathcal{Q}) = -\sum _{C}\dfrac{q_{C\curvearrowright }} {q_{\curvearrowright }} \log _{2}\left (\dfrac{q_{C\curvearrowright }} {q_{\curvearrowright }} \right ). }$$
(23)

In the second term, each term \(p_{\circlearrowright }^{C}H({\mathcal{C}}^{C})\) is the weighted average per-step code length needed to describe the movement of the random walker within (and leaving) community C. The entropy \(H({\mathcal{C}}^{C})\) is given analogously by

$$ \begin{array}{ccc} \begin{array}{rlrlrl} H({\mathcal{C}}^{C}) = -\dfrac{q_{C\curvearrowright }} {p_{\circlearrowright }^{C}} \log _{2}\left (\dfrac{q_{C\curvearrowright }} {p_{\circlearrowright }^{C}} \right ) -\sum _{i\in C} \dfrac{\pi _{i}} {p_{\circlearrowright }^{C}}\log _{2}\left ( \dfrac{\pi _{i}} {p_{\circlearrowright }^{C}}\right ), \end{array} & \end{array} $$
(24)

where \(p_{\circlearrowright }^{C} = q_{C\curvearrowright } +\sum _{j\in C}\pi _{j}\) is the associated weighting factor, describing the probability to use a code word from the codebook of community C. The optimal partition of a network is found by minimizing the Map equation, e.g., by finding the partition minimizing the description length of a random walk on the graph. The Map equation is usually defined for the discrete-time unbiased random walk (13) and is thus based on one-step transitions on the graph. However, it has been shown that the corresponding quality function suffers from some limitations, such as the fact that it is not able to uncover communities in multi-scale networks and that it exhibits a field-of-view limit that can result in undesirable over-partitioning when communities are highly structured [57]. This issue can be addressed by adopting a similar approach as in the previous section, namely, in introducing time explicitly by letting the random walker explore the network over a tunable amount of time [58]. The resulting Markov time sweeping induces a dynamical zooming across scales that can reveal community structure at different scales and circumvent field-of-view limit. In practice, defining the Map equation for the continuous-time random walk (15) leads to time-dependent leaving probabilities q C↷ (τ) given as

$$\displaystyle{ q_{C\curvearrowright }(\tau ) =\sum _{i\in C}\sum _{j\notin C}\left ({e}^{-\tau L}\right )_{ ji}\pi _{i}. }$$
(25)

With increasing time, this leaving probability and the associated cost for encoding communities increase. Looking at limiting values of τ suggests again that time acts as a resolution parameter [58]. In the limit of vanishing times, when the leaving probabilities go to zero, the Map equation is minimized setting each node in its own community. In the limit τ → , in contrast, the Map equation finds a partition made of one single module due to the mean-field nature of the dynamics. For intermediate times, the partitions optimizing the Map equation are expected to be made of modules of varying size, as confirmed by numerical simulations [58].

4.2.3 Discussion

As we have discussed in the previous subsections, flow-based approaches have the interesting property to incorporate a natural resolution parameter, time, allowing to explore the graph through paths of different length. However, flow-based approaches offer other advantages that make them an interesting alternative to combinatorial approaches. First, stability is based on flows of probability on the graph and therefore captures how the global structure of the system constrain patterns of flows, while quantities such as modularity focus on pairwise interactions and are blind to such patterns, thereby neglecting important aspects of the network architecture [53]. This difference is particularly marked when the network is directed, when the equilibrium solution of the process depends on the global organization of the process [28]. Flow-based approaches also offer the flexibility to chose a random process properly modeling how, e.g., information or energy flows on the graph and thus to tailor the quality function according to the network nature. For instance, this can be done by using the biased random walk process (14) in systems where unbiased random walks are not realistic models. Important examples include the Internet and traffic networks, where a bias is necessary to account for local search strategies and navigation rules [69].

Partitions at different values of τ are found independently by optimizing either stability or the Map equation, thereby producing a sequence of partitions that are optimal at different scales. However, one expects that only a small number of these partitions are significant, which raises another question: how can one select the most significant partitions, or equivalently the most significant scales of description of the network? It is ironical to note that we are thus confronted with a problem similar to the one that initially led to the definition of modularity.Footnote 4 In order to address this problem, it has recently been proposed to look for robust partitions [17, 24, 29, 37]. In practice, the problem is slightly modified, by changing either the resolution parameter, the graph, or the optimization algorithm, and variability among the uncovered partitions is considered. In each case, robustness is related to the ruggedness of the quality function landscape [21]. Lack of robustness corresponds to high degeneracy, namely, to the existence of incompatible partitions that are local maxima of the quality function such that partitions are strongly altered by a slight modification of the optimization problem. Significant partitions are uncovered by identifying values of the resolution parameter where these measures of robustness are significantly low. In the case of the Map equation, an alternative approach is based on an information theoretic indicator for the reliability of the Map equation [58], by measuring the gap between the optimal code compression and the compression achieved by the Map coding strategy. Relevant values of the resolution parameter are then signaled by a low compression gap.

Before concluding, let us mention recent works where the ideas developed in this chapter have been successfully tested on empirical networks of different nature and on benchmark networks [1214, 28, 29, 57, 58]. We should also stress that the multi-scale methods described here are expected to fail when the system does not exhibit any scale of description, e.g., when the size of the communities is heterogeneously distributed [35]. In that case, other types of methods might reveal more successful, for instance, local methods in contrast with global ones [36].

5 Conclusion

In this chapter, we have focused on the detection of nonoverlapping modules in multi-scale networks. These networks are made of different levels of organization and are typically (but not necessarily) hierarchical, in the sense that the system is made of modules, which themselves are made of sub-modules, etc. In order to account for the presence of multiple levels of organization in the system, multi-scale methods have been introduced that incorporate a resolution parameter allowing to zoom in and out and to focus on the appropriate level of resolution. A class of popular combinatorial quality functions incorporate a resolution parameter to Newman–Girvan modularity in order to adjust the characteristic size of the modules and to uncover the true modular organization of a network. Unfortunately, these multi-resolution quality functions exhibit the same type of limitation as modularity when the resolution parameter is fixed [26]. Moreover, the introduction of a resolution parameter lacks a sound mathematical justification. Here, we argue for the use of flow-based approaches instead of combinatorial ones for several reasons. We have shown that using the trajectories of random walker on the graph defines quality functions with a natural resolution parameter, time. As time increases, the diffusive process involves longer trajectories and explores further afield the structure of the graph, resulting in the detection of the modular structure across scales. As we have shown, this general scheme can be used in a variety of methods, such as stability and the Map equation. Flow-based approaches have the further advantage of properly taking into account dynamical flows taking place on the graph. This property reveals crucial to provide a faithful characterization of the system whenever complex interdependences between network subunits are generated by patterns of flow, e.g., information in social networks or passengers in airline networks.