Compressed representations for web and social graphs

Hernández, Cecilia; Navarro, Gonzalo

doi:10.1007/s10115-013-0648-4

Compressed representations for web and social graphs

Regular Paper
Published: 26 April 2013

Volume 40, pages 279–313, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Knowledge and Information Systems Aims and scope Submit manuscript

Compressed representations for web and social graphs

Download PDF

Cecilia Hernández^2,1 &
Gonzalo Navarro²

922 Accesses
37 Citations
Explore all metrics

Abstract

Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds ($\upmu $s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20 $\upmu $s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12 $\upmu $s per out/in-neighbor retrieved, which improves upon all existing representations.

Fast Construction of Compressed Web Graphs

GraphZIP: a clique-based sparse graph compression method

Article Open access 03 March 2018

Distributed Query Processing on Compressed Graphs Using K2-Trees

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Web graphs represent the link structure of the Web. They are usually modeled as directed graphs where nodes represent pages and edges represent links among pages. On the other hand, social networks represent relationships among social entities. These networks are modeled by undirected or directed graphs depending on the relation they model. For instance, the friendship relation in Facebook is symmetric and, then, it is modeled by an undirected graph, whereas the “following” relation on Twitter and LiveJournal is not symmetric, and therefore, it is modeled by a directed graph.

The link structure of Web graphs is often used by ranking algorithms such as PageRank [10] and HITS [38], as well as for spam detection [6, 50], for detecting communities [27, 39], and for understanding the structure and evolution of the network [26, 27]. A social network structure is often used for mining and analysis purposes, such as identifying interest groups or communities, detecting important actors [51, 57], and understanding information propagation [17, 37, 45]. Those algorithms use a graph representation that supports at least forward navigation (i.e., to the out-neighbors of a node or those pointed from it), and many require backward navigation as well (i.e., to the in-neighbors of a node or those that point to it).

Managing and processing these graphs are challenging tasks because Web graphs and social networks are growing in size very fast. For instance, a recent estimation of the indexable Web size states that it is over 7.8 billion pages (and thus, around 200 billion edges),^{Footnote 1} and Facebook has over 950 million active users worldwide.^{Footnote 2} Google has recently augmented the user search experience by introducing the knowledge graph,^{Footnote 3} which models the relationship of about half-million entities over 3.5 billion relationships among the entities. This knowledge graph is used in addition to the Web graph to improve the search efficacy.

Different approaches have been used to manage large graphs. For instance, streaming and semi-streaming techniques can be applied with the goal of processing the graph sequentially, ideally in one pass, although a few passes are allowed. The idea is to use main memory efficiently, avoiding random access to disk [25]. External memory algorithms define memory layouts that are suitable for graph algorithms, where the goal is to exploit locality in order to reduce I/O costs, reducing random accesses to disk [56]. Another approach is the use of distributed systems, where distributed memory is aggregated to process the graph [53]. However, depending on the problem, the synchronization and communication required may impose I/O costs similar to those of the external memory approach.

Compressed data structures aim to reduce the amount of memory use by representing graphs in compressed form while being able to answer the queries of interest without decompression. Even though these compressed structures are usually slower than uncompressed representations, they are still much faster than incurring I/O costs: They can be orders of magnitude faster when they can fit completely in main memory graphs that would otherwise require disk storage. When considering a distributed scenario, they allow the graphs to be deployed on fewer machines, yielding important savings in communication costs and energy.

Several proposals use compressed data structures for Web graphs, mainly enabling out-neighbor queries [4, 7, 21, 32]; yet, some also support bidirectional navigation (i.e., handle out/in-neighbor queries) [11, 20]. Some more recent ones address social networks [9, 19, 23, 43].

In this paper, we introduce new approaches to develop competitive compressed data structures for managing and processing large Web and social graphs. The main contributions of this work follow:

We enhance an existing technique to detect bicliques [16] so that it detects more general “dense subgraphs.” These include cliques, bicliques, and in general not necessarily disjoint pairs of node sets where all in the first set point to all in the second set.^{Footnote 4} We study the effectiveness of the technique and demonstrate that it captures a fair amount of the structure of Web graphs (more than 90 %) and social networks (around 60 %), improving upon the detection of bicliques (where the sets must be disjoint). We show how to process large graphs in the secondary memory. This new graph mining technique is key to the success of the compressed representations we develop.
We apply their “virtual node mining” technique [16] on the discovered dense subgraphs, which replaces the edges of the dense subgraph by a virtual node with fewer links. We then list the nodes in the BFS order of Apostolico and Drovandi [4] and use their encoding. The result is a Web graph representation with out-neighbor query support that is either very close to or better than, in space and time, the best current representation [32]: On large Web graphs, it uses 1.0–1.8 bits per edge (bpe) and retrieves each neighbor in 0.6–1.0 microsecond ($\upmu $s). We show, however, that our technique is more robust as it performs equally well on the transposed Web graph, whereas the one by Grabowski and Bieniecki [32] performs significantly worse.
By maintaining the BFS ordering after virtual node mining, but now using a bidirectional representation (k2-tree) on the resulting graph [11], we obtain the smallest existing representation with out/in-neighbor support: 0.9–1.5 bpe, much smaller than in the previous item. The price is that the query time is higher: 5–20 $\upmu $s per extracted neighbor.
We design a novel compressed data structure to represent the dense subgraphs that does not use virtual nodes. This representation supports not only out/in-neighbor navigation, but also various graph mining queries based on the dense subgraphs discovered, such as listing cliques and bicliques, retrieving density and size of the subgraphs, finding node participation in different subgraph patterns, and so on. While this technique is not competitive with the previous one on Web graphs (yet, it supports other queries), it excels in social networks, where it achieves the best spaces so far with support for out/in-neighbor queries: 4–13 bpe and 8–12 $\upmu $s per retrieved neighbor.

Conference versions of this work appeared in SNA-KDD workshop [35] and in SPIRE [36]. This article extends that work with a thorough analysis of the quality of the dense subgraph finding algorithm, a secondary memory variant of the algorithm, its application to the transposed Web graphs, improved combinations of the scheme with BFS orderings, and the study of other graph mining queries.

In all the experiments we described in this paper, we used a Linux PC with 16 processors Intel Xeon at 2.4 GHz, with 72 GB of RAM, and 12 MB of cache. We used g++ compiler with full optimization.

2 Related work

We divide this section in two parts. First, we survey compression techniques for Web and social graphs, and the supported queries. Second, we discuss compact data structures based on bitmaps and symbol sequences that provide guarantees in terms of space and access times. Such structures are the basis for the compressed data structure we present in Sect. 5.

2.1 Compressed representations for Web and social graphs

Compressing Web graphs has been an active research area for some time. Suel and Yuan [52] built a tool for Web graph compression distinguishing global links (pages on different hosts) from local ones (pages on the same host) and combining different coding techniques, such as Huffman and Golomb codes. Adler and Mitzenmacher [1] achieved compression by using similarity. The idea was to code an adjacency list by referring to an already coded adjacency list of another node that points to many of the same pages. They used this idea with Huffman coding to achieve compression of global links. Randall et al. [48] proposed lexicographic ordering of URLs as a way to exploit locality (i.e., that pages tend to have hyperlinks to other pages on the same domain) and similarity of (nearby) adjacency lists for compressing Web graphs.

Later, Boldi and Vigna [7] proposed the WebGraph framework. This approach also exploits power-law distributions, similarity and locality using URL node ordering. Essentially, given a node ordering that enhances locality and similarity of nearby lists, WebGraph uses an encoding based on gaps and pointers to near-copies that takes advantage of those properties. The main parameters of this compression technique are $w$ and $m$, where $w$ is the window size and $m$ is the maximum reference count. The window size means that the list $l_i$ can only be expressed as a near-copy of $l_{i-w}$ to $l_{i-1}$, whereas the reference count of list $l_i$ is $r(l_i)=0$ if it is not expressed as a near-copy of another list, or $r(l_i)=r(l_j)+1$ if $l_i$ is encoded as a near-copy of list $l_j$. Increasing $w$ and $m$ improves compression ratio, but also increases access time.

In a later work, Boldi et al. [8] explored existing and novel node ordering methods, such as URL, lexicographic, Gray ordering, etc. More recently, Boldi et al. [9] designed node orderings based on the clustering methods and achieved improvements on compressing Web graphs and social networks with a clustering algorithm called layered label propagation (LLP). A different and very competitive node ordering was proposed by Apostolico and Drovandi [4]. Their approach orders the nodes based on a breadth-first traversal (BFS) of the graph, and then, they used their own encoding that takes advantage of BFS. They encode the out-degrees of the nodes in the order given by the BFS traversal, plus a list of the edges that cannot be deduced from the BFS tree. They achieve compression by dividing those lists into chunks and taking advantage of locality and similarity. The compression scheme works on chunks of $l$ nodes. Parameter $l$ (called the level) provides a tradeoff between compression performance and time to retrieve the adjacency list of a node.

Buehrer and Chellapilla [16] exploited the existence of many groups consisting of sets of pages that share the same outlinks, which defines complete bipartite subgraphs (bicliques). Their approach is based on reducing the number of edges by defining virtual nodes that are artificially added in the graph to connect the two sets in a biclique. They applied this process iteratively on the graph until the edge reduction gain is no longer significant. Then, they applied delta codes on the edge-reduced graph. However, they did not report times for extracting neighbors. They called this scheme as virtual node mining (VNM). Anh and Moffat [3] also exploit similarity and locality of adjacency lists, but they divide the lists into groups of $h$ consecutive lists. A $model$ for a group is built as a union of the group lists. They reduced lists by replacing consecutive sequences in all $h$ lists by a new symbol. The process can be made recursive by applying it to the $n/h$ representative lists. They finally applied codes such as $\varsigma $-codes [7] over all lists. This approach is somehow similar to that of Buehrer and Chellapilla [16], but Anh and Moffat[3] do not specify how they actually detect similar consecutive lists.

Grabowski and Bieniecki [32] (see also [31]) recently provide a very compact and fast technique for Web graphs. Their algorithms are based on blocks consisting of multiple adjacency lists in a way similar to Anh and Moffat work [3], reducing edge redundancy, but they use a compact stream of flags to reconstruct the original lists. Their encoding is basically a reversible merge of all lists. The parameter $h$ sets the number of adjacency lists stored in blocks. Increasing the value of $h$ improves compression rate at the cost of access time.

Another approach that can also be seen as decreasing the number of total edges and adding virtual nodes was proposed by Claude and Navarro [21]. This approach is based on Re-Pair [40], a grammar-based compressor. Re-Pair repeatedly finds the most frequent pair of symbols in a sequence of integers and replaces it with a new symbol.

Most of the Web graph compression schemes (as the ones described above) support out-neighbor queries, that is, the list of nodes pointed from a given node, just as an adjacency list. Being able to solve in-neighbor queries (i.e., the list of nodes pointing to a given node) is interesting for many applications from random sampling of graphs to various types of mining and structure discovery activities, as mentioned in Sect. 1. It is also interesting in order to represent undirected graphs without having to store each edge twice.

Brisaboa et al. [11] exploited the sparseness and clustering of the adjacency matrix to reduce space while providing out/in-neighbor navigation in a natural symmetric form, using a structure called k2tree. They have recently improved their results by applying BFS node ordering on the graph before building the k2tree [12]. This achieves the best known space/time tradeoffs supporting out/in-neighbor access for Web graphs. The k2tree scheme represents the adjacency matrix by a $k^2$-ary tree of height $h=\lceil \log _k n \rceil $ (where $n$ is the number of vertices). It divides the adjacency matrix into $k^2$ submatrices of size $n^2/k^2$. Complete empty subzones are represented just with a 0-bit, whereas nonempty subzones are marked with a 1-bit and recursively subdivided. The leaf nodes contain the actual bits of the adjacency matrix, in compressed form. Recently, Claude and Ladra [23] improved the compression performance on Web graphs by combining the k2tree with the Re-Pair-based representation [21]. Another representation able to solve out/in-neighbors [20] was obtained by combining the Re-Pair-based representation [21] with compact sequence representations [22] of the resulting adjacency lists. The times for out- and in-neighbor queries are not symmetric.

Some recent works on compressing social networks [19, 43] have unveiled compression opportunities as well, although in much less degree than on Web graphs. The approach by Chierichetti et al. [19] is based on the Webgraph framework [7], using shingling ordering (based on Jaccard coefficient) [13, 28] and exploiting link reciprocity. Even though they achieve interesting compression for social networks, their approach requires decompressing the graph in order to retrieve the out-neighbors. Maserrat and Pei [43] achieve compression by defining an Eulerian data structure using multi-position linearization of directed graphs. This scheme is based on decomposing the graph into small dense subgraphs and supports out/in-neighbor queries in sublinear time. Claude and Ladra [23] improve upon this scheme by combining it with the use of compact data structures.

2.2 Compact data structures for sequences

We make use of compact data structures based on bitmaps (sequences of bits) and sequences of symbols. These sequences support operations $rank,\,select$ and $access$. Operation $rank_B(b,i)$ on the bitmap $B[1,n]$ counts the number of times bit $b$ appears in the prefix $B[1,i]$. Operation $select_B(b,i)$ returns the position of the $i$th occurrence of bit $b$ in $B$ (and $n+1$ if there are no $i\,b$’s in $B$). Finally, operation $access_B(i)$ retrieves the value $B[i]$. A solution requiring $n+o(n)$ bits and providing constant time for rank/select/access queries was proposed by Clark [24], and good implementations are available (e.g., RG [29]). Later, Raman et al. [49] managed to compress the bitmap while retaining constant query times. The space becomes $nH_0(B) + o(n)$ bits, where $H_0(B)$ is the zero-order entropy of $B,\,H_0(B) = \frac{n_0}{n} \log \frac{n}{n_0} + \frac{n_1}{n} \log \frac{n}{n_1} \le 1$, where $B$ has $n_0$ zeros and $n_1$ ones (we use binary logarithms by default). Good implementations are also available (i.e., RRR [22]).

The bitmap representations can be extended to compact data structures for sequences $S[1,n]$ over an alphabet $\Sigma $ of size $\sigma $. The wavelet tree (WT) [33] supports rank/select/access queries in $O(\log \sigma )$ time. It uses bitmaps internally, and its total space is $n \log \sigma $ $+$ $o(n) \log \sigma $ bits if representing those bitmaps using RG, or $nH_0(S)+o(n) \log \sigma $ bits if using RRR, where $H_0(S) = \sum _{c \in \Sigma } \frac{n_c}{n} \log \frac{n}{n_c} \le \log \sigma ,\, n_c$ being the number of occurrences of $c$ in $S$. As our alphabets will be very large, we use the version “without pointers” [22], which saves an extra space of the form $O(\sigma \log n)$. Another sequence representation (GMR) [30] uses $n\log \sigma + n\,o(\log \sigma )$ bits and supports $rank$ and $access$ in time $O(\log \log \sigma )$, and $select$ in $O(1)$ time.

3 Dense subgraphs

In this section, we describe the algorithm to discover dense subgraphs, such as bicliques, cliques, and generalizations, and study the quality of our algorithm. This technique is the basis for all the compressed representations that follow.

3.1 Basic notions

We represent a Web graph as a directed graph $G=(V,E)$ where $V$ is a set of vertices (pages) and $E \subseteq V \times V$ is a set of edges (hyperlinks). For an edge $e=(u,v)$, we call $u$ the source and $v$ the center of $e$. In social networks, nodes are individuals (or other types of agents) and edges represent some relationship between the two nodes. These graphs can be directed or undirected. In case they are undirected, we make them directed by representing both reciprocal directed edges. Thus, from now on we consider only directed graphs.

We follow the idea of “dense communities” in the Web described by Kumar et al. [39] and Dourisboure et al. [27], where a community is defined as a group of pages related to a common interest. Such Web communities are characterized by dense-directed bipartite subgraphs. In fact, Kumar et al. [39] summarize that a “random large enough and dense bipartite subgraph of the Web almost surely has a core (a complete bipartite subgraph)”, which they aim to detect. Left sets of dense subgraphs are called Fans, and right sets are called Centers. In this work, we call the sets Sources (S) and Centers (C), respectively, which is the same naming given by Buehrer and Chellapilla [16]. One important difference of our work from Kumar et al. [39] and Dourisboure et al. [27] is that we do not remove edges before applying the discovery algorithm. In contrast, both works [27, 39] remove all nepotistic links, that is, links between two pages that belong to the same domain. In addition, Dourisboure et al. [27] removes isolated pages, that is, pages with zero out-neighbors and in-neighbors.

For technical reasons that will be clear next, we will add all the edges $(u,u)$ to our directed graphs. We use a small bitmap of $|$V$|$ bits to mark which nodes $u$ actually had a self-loop. We use this bitmap to remove the spurious self-loops from the edges output by our structures.

We also note that the discovery algorithms are applied over Web graphs with natural node ordering [9], which is basically URL ordering, because they provide better results than using other node orderings.

We will find patterns of the following kind.

Definition 3.1

A dense subgraph $H(S,C)$ of $G=(V,E)$ is a graph $G^{\prime }(S\cup C,S \times C)$, where $S,C \subseteq V$.

Note that, Definition 3.1 includes cliques ($S=C$) and bicliques ($S\cap C =\emptyset $), but also more general subgraphs. Our goal is to represent the $|S|\cdot |C|$ edges of a dense subgraph using $O(|S|+|C|)$ space. Two different techniques to do so are explored in Sects. 4 and 5.

3.2 Discovering dense subgraphs

In this section, we describe how we discover dense subgraphs. Even finding a clique of a certain size is NP-complete, and the existing algorithms require time exponential on that size (e.g., Algorithm 457 [15]). Thus, we need to resort to fast heuristics for our huge graphs of interest. Besides, we want to capture other types of dense subgraphs, not just cliques. We first use a scalable clustering algorithm [16], which uses the idea of “shingles” [28]. Once the clustering has identified nodes whose adjacency lists are sufficiently similar, we run a heavier frequent itemset mining algorithm [16] inside each cluster. This mining algorithm is the one that finds sets of nodes $S$ that point to all the elements of another set of nodes $C$ (they can also point to other nodes).

This algorithm was designed to find bicliques: A node $u$ cannot be in $S$ and $C$ unless $(u,u)$ is an edge. As those edges are rare in Web graphs and social networks, this algorithm misses the opportunity to detect dense subgraphs and is restricted to find bicliques.

To make the algorithm sensitive to dense subgraphs, we insert all the edges $\{ (u,u), u \in V\}$ in $E$, as anticipated. This is sufficient to make the frequent itemset mining algorithm find the more general dense subgraphs. The spurious edges added are removed at query time, as explained.

The clustering algorithm represents each adjacency list with $P$ fingerprints (hash values), generating a matrix of fingerprints of $|V|$ rows and $P$ columns. Then, it traverses the matrix column-wise. At stage $i$, the matrix rows are sorted lexicographically by their first $i$ column values, and the algorithm groups the rows with the same fingerprints in columns 1 to $i$. When the number of rows in a group falls below a small number, it is converted into a cluster formed by the nodes corresponding to the rows. Groups that remain after the last column is processed are also converted into clusters.

On each cluster, we apply the frequent itemset mining algorithm, which discovers dense subgraphs from the cluster. This algorithm first computes frequencies of the nodes mentioned in the adjacency lists and sorts the list by decreasing frequency of the nodes. Then, the nodes are sorted lexicographically according to their lists. Now each list is inserted into a prefix tree, discarding nodes of frequency 1. This prefix tree has a structure similar to the tree obtained by the hierarchical termset clustering [47]. Each node $p$ in the prefix tree has a label (consisting of the node id), and it represents the sequence $l(p)$ of labels from the root to the node. Such node $p$ stores also the range of graph nodes whose list start with $l(p)$.

Note that, a tree node $p$ at depth $c=|l(p)|$ representing a range of $s$ graph nodes identifies a dense subgraph $H(S,C)$, where $S$ is the graph nodes in the range stored at the tree node, and $C$ is the graph nodes listed in $l(p)$. Thus, $|S|=s$ and $|C|=c$. We can thus point out all the tree nodes $p$ where $s\cdot c$ is over the size threshold and choose them from largest to lowest saving (which must be recalculated each time we choose the largest).

Figure 1a shows a dense subgraph pattern with the traditional representation, and Fig. 1b shows the way we represent them using the discovery algorithm described.

The whole algorithm can be summarized in the following steps. Figure 2 shows an example.

Step 1 Clustering-1 (build hashed matrix representing G) We traverse the graph specified as set of adjacency lists, adding edges $(u,u)$. Then, we compute a hash value $H$ associated with each edge of the adjacency list $P$ times and choose the $P$ smallest hashes associated with each adjacency list. Therefore, for each adjacency list, we obtain $P$ hash values. This step requires $O(P |E|)$ time.
Step 2 Clustering-2 (build clusters) We build clusters consisting of groups of similar hashes, by sorting the hash matrix by columns, and select adjacency lists associated with clusters based on hashes. This requires $O(P |V| \log |V| )$ time.
Step 3 Mining-1 (reorder cluster edges) We compute edge frequencies on each cluster, sorting them from largest to smallest (discarding edges with frequency of 1), and reorder them based on that order. This step takes $O(|E| \log |E|)$ time.
Step 4 Mining-2 (discover dense subgraphs and replacing) We compute a prefix tree for each cluster, with tree nodes labeled with the node id of edges. Dense subgraphs ($G^{\prime }(S\cup C,S \times C)$) with higher edge saving ($|S|\times |C|$) are identified in the tree. The overall step is bounded to $O(|E|\log |E|)$ time.

Therefore, the overall algorithm time complexity, taking $P$ as a constant, is bounded by $O( |E| \log |E|)$.

In Sect. 4, the dense subgraphs found $H(S,C)$ will be replaced by a new virtual node whose in-neighbors are $S$ and whose out-neighbors are $C$. As the result is still a graph, the dense subgraph discovery process can be repeated on the resulting graph. In Sect. 5, instead, the graph $H(S,C)$ will be extracted only from the original graph and represented using a compact data structure.

3.3 Evaluation of the discovery algorithm

First, we evaluate the sensibility of the number of hashes (parameter $P$) used in the first step of our clustering. For doing so, we use a real Web graph (eu-2005, see Table 7). We measure the impact of $P$ in various metrics that predict compression effectiveness. Table 1 shows the number of discovered cliques (# Cliques), total number of edges in those cliques ($|$Cliques$|$), number of bicliques (# Bicliques), total number of edges in cliques and bicliques (Edges), total number of nodes participating in cliques and bicliques (Nodes), and the ratio between both (Ratio, which gives the reduction factor using our technique of Sect. 5). All these metrics show that using $P=2$ is slightly better than using other values. When increasing $P$, the algorithm discovers more and smaller cliques and bicliques, but the overall compression in terms of representing more edges with fewer vertices is better with $P=2$.

Table 1 Compression metrics using different $P$ values with eu-2005

Compressed representations for web and social graphs

Abstract

Similar content being viewed by others

Fast Construction of Compressed Web Graphs

GraphZIP: a clique-based sparse graph compression method

Distributed Query Processing on Compressed Graphs Using K2-Trees

Explore related subjects

1 Introduction

2 Related work

2.1 Compressed representations for Web and social graphs

2.2 Compact data structures for sequences

3 Dense subgraphs

3.1 Basic notions

Definition 3.1

3.2 Discovering dense subgraphs

3.3 Evaluation of the discovery algorithm

4 Using virtual nodes

4.1 Dense subgraph mining effectiveness

4.2 Performance evaluation with out-neighbor support

4.3 Performance evaluation with out/in-neighbor support

4.4 Scalability

5 Compact data structure for dense subgraphs

5.1 Extracting dense subgraphs

5.2 Representing the graph

Definition 5.1

5.3 Compact representation of \(\mathcal H \)

5.4 Neighbor queries

5.5 Supporting mining queries

5.6 Dense subgraph mining effectiveness

5.7 Space/time performance

6 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation