Coloring large complex networks

Rossi, Ryan A.; Ahmed, Nesreen K.

doi:10.1007/s13278-014-0228-y

Coloring large complex networks

Original Article
Published: 12 September 2014

Volume 4, article number 228, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Social Network Analysis and Mining Aims and scope Submit manuscript

Coloring large complex networks

Download PDF

Ryan A. Rossi¹ &
Nesreen K. Ahmed¹

515 Accesses
20 Citations
Explore all metrics

Abstract

Given a large social or information network, how can we partition the vertices into sets (i.e., colors) such that no two vertices linked by an edge are in the same set while minimizing the number of sets used. Despite the obvious practical importance of graph coloring, existing works have not systematically investigated or designed methods for large complex networks. In this work, we develop a unified framework for coloring large complex networks that consists of two main coloring variants that effectively balances the tradeoff between accuracy and efficiency. Using this framework as a fundamental basis, we propose coloring methods designed for the scale and structure of complex networks. In particular, the methods leverage triangles, triangle-cores, and other egonet properties and their combinations. We systematically compare the proposed methods across a wide range of networks (e.g., social, web, biological networks) and find a significant improvement over previous approaches in nearly all cases. Additionally, the solutions obtained are nearly optimal and sometimes provably optimal for certain classes of graphs (e.g., collaboration networks). We also propose a parallel algorithm for the problem of coloring neighborhood subgraphs and make several key observations. Overall, the coloring methods are shown to be (1) accurate with solutions close to optimal, (2) fast and scalable for large networks, and (3) flexible for use in a variety of applications.

A Hybrid Approach for Exact Coloring of Massive Graphs

A Note on Coloring $(4K_1, C_4, C_6)$-Free Graphs with a $C_7$

Article 26 August 2022

Solving Graph Coloring Problems with the Douglas-Rachford Algorithm

Article 05 January 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We study the problem of graph coloring for complex networks such as social and information networks. Our focus is on designing (1) accurate coloring methods that are (2) fast for large-scale networks of massive size. These requirements lead us to introduce a unified coloring framework that can serve as a basis for investigating and comparing the proposed methods.

Graph coloring is an important fundamental problem in combinatorial optimization with numerous applications including timetabling and scheduling (Budiono and Wong 2012), frequency assignment (Sivarajan et al. 1989; Banerjee and Mukherjee 1996), register allocation (Chaitin 1982), and more recently to study networks of human subjects (Kearns et al. 2006; Chaudhuri et al. 2008), among many others (Colbourn and Dinitz 2010; Moscibroda and Wattenhofer 2008; Ni et al. 2011; Capar et al. 2012; Schneider and Wattenhofer 2011; Grohe et al. 2013). The graph coloring problem consists of assigning colors to vertices such that no two adjacent vertices are assigned identical colors, while minimizing the number of colors. However, in general, the coloring problem is known to be computationally intractable (NP-hard), even to approximate it within $n^{1-\epsilon }$ (Garey and Johnson 1979). Nevertheless, coloring lies at the heart of many applications where the goal is to partition a set of entities into classes where two related entities are not in the same class while also minimizing the number of classes used.

Despite its practical importance in a variety of domains (e.g., engineering, scientific computing), coloring algorithms for complex networks such as social, biological and information networks have received considerably less attention. Majority of work focuses on graphs that are relatively small, synthetic, or from other domains. However, these real-world networks (e.g., social networks) are usually sparse with complex structural patterns (Newman and Park 2003; Boccaletti et al. 2006; Barabasi and Oltvai 2004; Davidson et al. 2013; Kleinberg 2000; Adamic et al. 2001), while also massive in size and growing at a tremendous rate over time. For instance, the web graph has well over 1 trillion pages, whereas social networks such as Facebook have hundreds of millions of users. Unfortunately, coloring algorithms suitable for these large sparse real-world networks have been largely ignored, even despite the significance of coloring and its potential for use in a wide variety of applications. Furthermore, due to the aforementioned reasons, there has yet to be a systematic investigation of coloring and its potential applications.

In terms of social networks, coloring has been used for finding roles (see Everett and Borgatti 1991), but that work is limited to extremely small instances and does not scale to the requirements of modern social and information networks present in the age of big data. Others have used coloring to study small controlled groups of human subjects and their behavior (Kearns et al. 2006; Chaudhuri et al. 2008). Nevertheless, coloring methods for large sparse networks have not been proposed, nor has coloring been used for applications in these large networks.

The age of big network data has given rise to numerous opportunities and potential applications for graph coloring including descriptive and predictive modeling tasks. A few of the possibilities are discussed below. For instance, the number of colors, distribution of the size of independent sets, and other properties derived from coloring are useful in tasks such as relational classification (as features) (Sen et al. 2008; De Raedt and Kersting 2008), graph similarity (Berlingerio et al. 2013), anomaly detection (Akoglu et al 2010; Aggarwal et al. 2011), network analysis (Chaoji et al. 2008; Sun et al. 2008; Kang et al. 2011; Wang and Davidson 2010), or for evaluating graph generators, among many other tasks (Sharara et al. 2012). Additionally, vertex or edge induced neighborhoods may also be colored to study various questions; similar to the work of Ugander et al. (2013a) which used neighborhood motifs instead. Independent sets are also seemingly useful in many applications. One such application is network sampling, where vertices/edges may be selected from a large independent set to ensure good network expansion (and of course independence), and may be useful for estimating properties efficiently in the age of big data (Al Hasan and Zaki 2009; Ahmed et al. 2014). Indeed, such a sampling strategy would also be particularly useful for machine learning problems such as relational active learning (Sharma and Bilgic 2013), see the work of Bilgic et al. (2010). It is also easy to find applications in other problem domains, e.g., network A/B testing (Ugander et al. 2013b) which requires running randomized experiments on two independently sampled universes, A and B, to test the effectiveness of new products and marketing campaigns.

Although some recent work has used coloring in small social networks (Enemark et al. 2011; Mossel and Schoenebeck 2010), there has not been any systematic evaluation or comparison of coloring methods for large complex networks of various types. Further, this recent work also used only small networks. Moreover, the majority of previous work used a single coloring method and therefore lacked any evaluation or comparison to other coloring methods. Due to this, the properties and behavior of coloring algorithms for social and information networks are not well understood and are left largely unexplored. This work attempts to fill this gap by developing a variety of techniques that exploit the structure of these large networks while also being fast and scalable for partitioning the vertices into independent sets.

More specifically, we address the theoretically and practically important problem of graph coloring with a focus on coloring large complex networks such as social, biological and technological networks. For this purpose, we develop a flexible framework that serves as a foundation for coloring real-world graphs. The framework is designed to be fast, scalable, and accurate across a wide variety of networks (i.e., social, biological). To satisfy these requirements, we relax the constraint of using the minimum number of colors, and instead focus on balancing the competing tradeoffs of accuracy and performance. This relaxation provides us a framework that scales linearly with the graph size, while also accurate as demonstrated in Sect. 6. Using this framework, we propose three classes of coloring methods designed specifically for the scale and the underlying structure of these complex networks. These include social-based methods, multi-property methods, and egonet-based coloring methods (See Table 1). We also adapt previous coloring methods/heuristics that have been widely used on small and/or dense graphs from other domains (Gebremedhin et al. 2013; Leighton 1979; Matula and Beck 1983; Coleman and Moré 1983; Welsh and Powell 1967; McCormick 1983) and unify them under the greedy coloring framework. This provides us with a basis for comparing our proposed techniques with those traditionally used. We also develop static and dynamic ordering techniques for coloring based on triangle counts, triangle-cores (Zhang and Parthasarathy 2012; Rossi 2014), and a variety of egonet properties, and demonstrate the effectiveness of these methods using a large collection of networks from a variety of domains including social, biological, and technological networks.

The dynamic triangle ordering techniques proposed here are likely to be of use in other applications and/or problems such as for improving community detection (Blondel et al. 2008; Fortunato 2010), distance queries (Jiang et al. 2014), the maximum clique problem (Prosser 2012; Carraghan and Pardalos 1990), and numerous other problems that rely on an appropriate vertex/edge ordering.

We also formulated the problem of coloring neighborhood subgraphs and proposed a parallel algorithm that leverages our previous methods. One key finding is that neighborhoods that are colored using a relatively few number of colors are not well connected, with low clustering and a small number of triangles. While neighborhood colorings that use a relatively large number of colors have large clustering coefficients and usually contain large cliques. Nevertheless, we also find linear speedups and many other interesting results (See Sect. 7 for further details).

In addition to the technical contributions, the other aim of this work is a large-scale investigation of coloring methods for these types of networks. In particular, we compare the three classes of our proposed coloring methods to a wide variety of previous methods that are considered state-of-the-art for relatively small and/or dense graphs from other domains. Using our unified framework as a basis, we systematically evaluate our proposed coloring methods (with past methods) on over 100 networks from a variety of types including social, biological, and information networks.^{Footnote 1}

The types of graphs differ in their size, semantics, structure, and the underlying process governing their formation. Overall, we find a significant improve over the previously proposed methods in nearly all cases. Moreover, the solutions obtained are nearly optimal and sometimes provably optimal for certain classes of graphs (e.g., collaboration networks). Additionally, the large-scale investigation on 100+ networks revealed a number of useful and insightful observations. One main finding of this work is that despite the pessimistic theoretical results previously mentioned, large sparse networks found in the real-world can be colored fast and accurately using the proposed methods.

The remainder of this article is organized as follows: Preliminaries are given in Sect. 2. Section 3 introduces the framework along with the proposed methods while Sect. 4 proposes the more accurate recolor variant. In Sect. 5, we derive the lower and upper bounds used throughout the remainder of the article. Section 6 demonstrates the effectiveness of the proposed methods on over a hundred networks. Next, Sect. 7 formulates the neighborhood coloring problem and proposes a parallel algorithm for coloring neighborhood subgraphs. We also provide numerous results indicating the scalability and utility of our approach. Finally, Sect. 8 concludes.

2 Background

Networks are ubiquitous and can be used to represent data in various domains, from social, biological, and information domains. Facebook is a good live example of a real-world network, where vertices represent people, and edges represent relationships/communications among them. In this section, we start by defining the fundamental graph properties used in the problem of coloring networks.

Assume $G=(V,E)$ is an undirected graph used to represent some network, such that $V$ is the set of vertices, and $E$ is the set of edges. We use the term $index(v)$ to refer to the index of a vertex $v$. This index represents the unique identifier of a vertex $v$ as it appears in the graph $G$. One simple example of an index could be the unique userid assigned to each user by online social network providers (e.g, Facebook). Similarly, we use $d(v)$ to represent the vertex degree, such that $d(v)$ is the number of adjacent vertices (i.e, neighbors) to $v$ in the graph. The concept of a vertex degree could simply describe the number of friends of a Facebook user.

Another property that proved to be useful particularly in social networks, is transitivity. A transitive edge would mean that if $u$ is connected to $v$ and $v$ is connected to $w$, then $u$ is connected to $w$. In this case $uvw$ represents a triangle in $G$. We use the term $tr(v)$ to refer to the number of triangles incident to a vertex $v$. In common parlance, for a user $x$ in a social network, the number of pairs of friends of $x$ that are also friends themselves would represent the number of triangles. The concept of transitivity can be also generalized to subgraphs with more than three vertices. In this case, every vertex in the subgraph is connected by an edge to every other. These types of subgraphs is typically called cliques. Note that cliques are maximal subgraphs, means that no other vertex in the network can be a member of the clique while preserving the same property that every vertex in the clique is connected to every other. In social networks, the occurrence of cliques indicates highly connected subgroups of users, such as co-workers.

Cliques are one example of the more generic concept of network groups. In networks, vertices can be divided into various types of groups or communities that help to explain the underlying network structure. In this section, we introduce two fundamental concepts of network groups related to the problem of coloring networks ($k$-core, and $k$ triangle-core).

A $k$-core is a maximal subgraph of $G$, such that every vertex in the subgraph is connected to at least $k$ others in the subgraph (Matula and Beck 1983). The concept of $k$-core was first introduced in (Szekeres and Wilf 1968). $k$-cores are useful for various applications in network analysis, such as finding communities and cliques (Rossi et al. 2014). A simple algorithm to find the $k$-core of the graph $G$ is to start with the whole graph, and remove any vertices that have degree less than $k$. Clearly, the removed vertices cannot be members of a $k$-core (i.e, a core with order $k$) under any conditions. Note that by removing these vertices, naturally, the connected vertices to the removed ones will reduce their degrees as well. Therefore, the procedure continues until there are no vertices in the graph with degree less than $k$. The output of this procedure is the $k$-core (or $k$-cores) of $G$.

This procedure can also be repeatedly used to compute the core decomposition of the graph—this means computing the core number of each vertex $v$. The core number of a vertex (denoted by $K(v)$) is defined as the highest order $k$ of a maximum $k$-core that $v$ can possibly belong to. While simple to implement, this procedure has a worst case runtime of $O(|E|\cdot |V|\cdot \log |V|)$. However, the runtime can be efficiently reduced to $O(|V|+|E|)$ by another implementation–which we use in this paper (see more details in Batagelj and Zaversnik 2003).

The concept of $k$ triangle-core has recently emerged in network analysis research, it was first proposed in (Cohen 2009), and improved in (Zhang and Parthasarathy 2012; Rossi 2014). A $k$ triangle-core is an edge-induced subgraph of $G$ such that each edge participates in at least $k-2$ triangles and $k \ge 2$. A subgraph $H_k = (V|E(F))$ induced by the edge-set $F$ is a maximal triangle core of order $k$ if $\forall (u,v) \in F : tr_H(u,v) \ge k-2$, and $H_k$ is the maximum subgraph with this property. Most importantly, we define the triangle core number denoted $T(u,v)$ of an edge $e=(u,v) \in E$ to be the highest order $k$ of a maximum triangle $k$-core that $e$ can possibly belong to. See Fig. 1 for further intuition. Computing the triangle core numbers of each edge $e$ in the graph $G$ is called the triangle core decomposition of $G$. In Sect. 3.2, we provide an efficient algorithm for computing the triangle core decomposition with runtime $O(|E|^{3/2})$.

3 Greedy coloring framework

In this section, we present a scalable fast framework for coloring large complex networks and introduce the variations designed for the structure of these large complex networks found in the real-world.

3.1 Problem definition

Let $G=(V,E)$ be an undirected graph. A clique is a set of vertices any two of which are adjacent. The maximum size of a clique in $G$ is denoted $\omega (G)$. An independent set $C$ is a set of vertices any two of which are non-adjacent, thus, $\forall (v,u) \in C$ iff $(v,u) \not \in E$. The graph coloring problem consists of assigning a color to each vertex in a graph $G$ such that no adjacent vertices share the same color, minimizing the number of colors used. More formally,

Definition 3.1

(Graph Coloring Problem) Given a graph $G$, find a mapping $\phi : V \rightarrow \{1,\ldots ,k\}$ where $\phi (v_i) \not = \phi (v_j)$ for each edge $(v_i,v_j) \in E$. such that $k$ (the number of colors) is minimum.

This problem may also be viewed as a partitioning of vertices $V$ into independent sets $C_1, C_2,\ldots ,C_k$ where $\{1,2,\ldots ,k\}$ are called colors and the sets $C_1,\ldots ,C_k$ are referred to as color classes. Thus, the graph coloring problem is to find the minimum number $k$ of independent sets (or color classes/partitions) required to color the graph $G$. Nevertheless, graph coloring is NP-hard to solve optimally (on general graphs), and for all $\epsilon > 0$, it is even NP-hard to approximate to within $n^{1-\epsilon }$ where $n$ is the number of vertices (Garey and Johnson 1979).

In this work, we relax the strict requirement of partitioning the vertices into the minimum number of independent sets to allow for colorings that are close to the optimal. This relaxation gives rise to fast linear-time coloring algorithms that perform well in practice (See Sect. 6). Motivated by this, we describe general conditions for greedy coloring that can serve as a unifying framework in the study of these algorithms. More formally, we define the greedy coloring framework as follows:

Definition 3.2

(Framework) Given a graph $G=(V,E)$ and a vertex property $f(\cdot )$, the greedy coloring framework selects the next (uncolored) vertex $v$ to be colored such that

$$\begin{aligned} v = \mathop {\hbox {argmax}}\limits _{v_i} f(v_i) \end{aligned}$$

The selected vertex $v$ is then assigned to the smallest permissible color. This process is repeated until all vertices are colored.

The main intuition of the greedy coloring framework is to color the vertices that are more constrained in their choice of color as early as possible, giving more freedom to the coloring algorithm to use fewer colors, and thus result in a tighter upper bound on the exact number of colors. As an aside, selecting the vertex that minimizes $f(v)$ usually results in a coloring that uses significantly more colors than the latter. Notice that a fundamental property of the above greedy coloring framework is that it is both fast and efficient, thus, providing us with a natural basis for investigating the coloring of large real-world networks, which is precisely the scope of this work.

The above definition of the framework uses a selection criterion as the basis for coloring. Instead, we replace the selection criterion with the more general notion of a vertex ordering. More specifically, given a graph $G=(V,E)$ and a vertex ordering

$$\begin{aligned} \pi = \{v_1,v_2,\ldots ,v_i,\ldots ,v_n\} \end{aligned}$$

of $V$, let ${\upchi }(G,\pi )$ denote the number of colors used by a greedy coloring method that uses the vertex ordering $\pi $ of $G$. Hence, the greedy coloring framework selects the next vertex to color based on the vertex ordering. This formalization allows for a more precise characterization of the framework that depends on three components:

1.
A graph property $f(G)$ for selecting the vertices to color
2.
The direction in which vertices are selected (e.g., smallest to largest). For instance, $\pi = \{v_1,\ldots ,v_n\}$ is from max to min if $f(v_1) \ge \cdots \ge f(v_n)$, or min to max if $f(v_1) \le \cdots \le f(v_n)$.
3.
A tie-breaking strategy for the case when the graph property assigns the same value to two vertices. Suppose $f(v) = f(u)$, then $v$ is before $u$ in the ordering $\pi $ if $f^{\star }(v) > f^{\star }(u)$ where $f^{\star }(\cdot )$ is another graph property used to break-ties.

Notice that two vertex orderings $\pi _1$ and $\pi _2$ from the graph property $f(G)$ may significantly differ in the number of colors used in a greedy coloring (i.e., ${\upchi }(G,\pi _1) \not = {\upchi }(G,\pi _2) + \epsilon $). This is due to the direction of the ordering (smallest to largest) and tie-breaking strategy selected. Consequently, a specific graph property $f(\cdot )$ defines a class of orderings where the order direction (from max to min) and tie-breaking strategy ($f^{\star }(\cdot )$) represent a specific member of that class of orderings. Note that in general $f(G)$ can be thought simply as a function for obtaining an ordering $\pi $.

In addition, we also define a few relationships between the graph parameters introduced thus far. Clearly, ${\upchi }(G,\pi )$ from a greedy coloring method is an upper bound on the exact number of colors required, denoted by $\chi (G)$, i.e., the minimum number of colors required for coloring $G$. Further, let $\omega (G)$ be the size of the maximum clique in $G$, which is also a lower bound on the minimum number of colors required to color $G$. This gives the following relationship:

$$\begin{aligned} \omega (G) \le \chi (G) \le {\upchi }(G,\pi ) \le \Delta (G)+1 \end{aligned}$$

where $\Delta (G)$ is the maximum degree of $G$.

An example of the framework is shown in Fig. 2. This illustration uses a proposed triangle selection criterion, which is shown later in Sect. 6 to be extremely effective for large social and information networks.

Table 1 Methods used as selection criterion

Coloring large complex networks

Abstract

Similar content being viewed by others

A Hybrid Approach for Exact Coloring of Massive Graphs

A Note on Coloring \((4K_1, C_4, C_6)\)-Free Graphs with a \(C_7\)

Solving Graph Coloring Problems with the Douglas-Rachford Algorithm

1 Introduction

2 Background

3 Greedy coloring framework

3.1 Problem definition

Definition 3.1

Definition 3.2

3.2 Ordering techniques

3.3 Algorithm and implementation

3.4 Complexity

4 Recolor variant

4.1 Algorithm

5 Bounds

5.1 Lower bounds

5.2 Upper bounds

6 Results and analysis

6.1 Accuracy

6.2 Scalability

6.3 Effectiveness of recolor

6.4 Bounds and provably optimal coloring

7 Finding colorful neighborhoods

7.1 Problem formulation

7.2 Neighborhood coloring

7.2.1 Parallelization

7.3 Experiments

8 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation