Keywords

1 Introduction

In data analysis procedure, graph serves as an important role for representing or depicting the inner structure of big data, which is widely used in various domains, such as Internet, social network, recommendation system, etc. Numerous data analysis problems can be transformed to graph processing problems and solved by some basic graph algorithms. Since the scale of data increases exponentially in recent years, lots of graph processing problems need to be solved by distributed systems for graph analysis, such as Pregel [1], GraphLab [2], PowerGraph [3], GraphX [4], etc. In this case, original big graphs need to be partitioned into several subgraphs and processed respectively in different computing nodes of the distributed system. Compared with distributed systems for conventional data, the graph data processing requires frequent access to the neighboring vertices and edges of a certain vertex, which may be not stored in the same computing node with its neighboring information and cause high communication cost. Therefore, an ideal graph partition has to minimize the boundary vertices or edges that need to be cut and thus reduce the total communication cost. What’s more, the load balance problem should also be taken into account.

As one of the most classic problem in graph theory, graph partitioning is proved to be NP-Complete [5]. Research in this field can be briefly classified into two categories: vertex partitioning and edge partitioning. Most of previous work focuses on the former approach. As shown in Fig. 1(a), vertex partitioning aims to partition vertices into different computing nodes with the constraint that each computing node contains similar number of vertices while shared edges among multiple computing nodes should be as less as possible. Though it has been studied for years, the performance of vertex partitioning in real-world application is not satisfied [68]. One primary reason is that the number of edges partitioned to different computing nodes differs greatly and thus cause unbalanced computing cost. Another reason is due to the fact that the number of edges in the real-world graphs is generally much larger than the number of vertices and lots of edges will be cut by the vertex partitioning approach, which leads to considerable communication cost.

Some research about edge partitioning has emerged in recent years [911]. As shown in Fig. 1(b), edge partitioning aims to partition edges into different computing nodes with the constraint that each computing node contains similar number of edges while shared vertices among multiple computing nodes should be as less as possible. [3, 4, 12] prove respectively from theory and practice aspects that edge partitioning results in better computing performance compared with vertex partitioning. Moreover, the edge partitioning approach is increasingly applied in the popular graph processing systems, including GraphLab, PowerGraph and GraphX. However, the current algorithms used in these systems are fairly simple, most of which rely on random partitioning mechanism. Though partitioning procedure is relatively fast in this way, it cannot guarantee the constraint of minimizing the number of times vertices are cut. So an effective and efficient edge partitioning algorithm for distributed graph systems has important influence for processing large-scale graphs.

Fig. 1.
figure 1

Graph partitioning in tow ways: vertex partitioning and edge partitioning.

In this paper, we propose a parallel edge partitioning algorithm for distributed graph systems, named as VSEP (Vertex Swap and Edge Partitioning). VSEP applies heuristic method to compute edge partitioning iteratively. Two kinds of heuristic methods are proposed, namely VSEP-square and VSEP-diagonal. We compared our algorithm with a state-of-the-art algorithm, JA-BE-JA-VC [11]. Experimental results show that our VSEP-diagonal reduces the number of times vertices are cut by about \(10\,\%\sim 20\,\%\) while retains the scale balance comparing with JA-BE-JA-VC.

The remaining of this paper is organized as follows. Section 2 introduces related work, including graph partitioning algorithms and distributed graph systems. In Sect. 3, we formally give the definition of graph partitioning problem and the formulation of optimization objective. The novel algorithm VSEP is introduced in detail in Sect. 4, including two heuristic methods VSEP-square and VSEP-diagonal. We conduct experimental studies in Sect. 5 and conclude this work in Sect. 6.

2 Related Work

Graph partitioning is one of the classic research fields about graph. Most of the current partition algorithms focus on vertex partitioning [1320] while edge partitioning can be implemented by vertex partitioning through changing the original graph into a line graph. There are two major categories for vertex partitioning: (i) centralized algorithms, and (ii) distributed algorithms.

The centralized algorithms assume random access to the entire graph, and usually use Multilevel Graph Partitioning (MGP) [16] to gain better result. [17, 2022] combine MGP with several different heuristics to improve vertex partitioning. METIS [17] adds heuristics into the coarsening, partitioning, and un-coarsening phases. KAFFPA [20] exploits locality of graph based on flow and localized searches. Besides the effect of vertex partitioning, the partitioning speed is another aspect to be considered. Parallelization is a widely used technique to speed up the partitioning by some systems. PARMETIS [18] and KAFFPAE [23] are the parallel implement of METIS and KAFFPA.

Although the centralized algorithms can achieve fast and good min-cuts, their assumption of random access to the entire graph is not feasible for large graphs. The distributed algorithms which only require for direct neighbors or subset of vertices are much better for large graph partitioning. JA-BE-JA [13] is a fully distributed graph partitioning algorithm based on local search and simulated annealing techniques [24]. The algorithm is processed on each vertex independently which only access to direct neighbors and random vertices locally. [25] implements the adaptations of JA-BE-JA on Spark [26]. However, without global information, the distributed algorithms may produce partitions of drastically different sizes.

While edges are many orders of magnitude more than vertices, edge partitioning maybe more efficient than vertex partitioning. [911] are recent works for edge partitioning. SBV-Cut [9] proposes recursive application of structurally-balanced graph cuts based on a solution to identify a set of balanced vertices. DFEP [10] applies a market model to describe a distributed edge partitioning algorithm. In the market graph model, vertices represent the funding of buyers and vertices use their funding to buy edges. Initially, all partitions are given the same amount of vertices funding. Edge partitioning is progressed in iteration. In iteration, each partition tries to buy edges whose source or destination vertex is in the partition, and each edge will be sold to the highest offer which means the edge is divided to the partition. Moreover, there exists a coordinator to balance the size of each partition.

JA-BE-JA-VC [11] applies a local search algorithm into JA-BE-JA in each iteration. The authors proposed several heuristics to optimize the final partitioning and it can achieve the partitions of any required size. Their experiments indicate the algorithm outperforms DFEP and random partitioning.

3 Problem Statement

As used herein, a directed graph \(G=(V,\,E)\) indicates a web graph, where V represents the set of nodes and E represents the set of edges in the graph. A k-way balanced edge partitioning divides the set of edges E into k subsets of equal size, where k is an input parameter. Each partition also has a subset of vertices that hold at least one of the edges in that partition. However, vertices are not unique across partitions, that is, some vertices may appear in more than one partition, due to the distribution of their edges across several partitions. A good edge partitioning strives to minimize the number of vertices that belongs to more than one partition.

Formally, we use a function P to label each edge with a number in {1,...,k}. The label \(i=P(e)\) of an edge indicates that the edge belongs to partition \(E_i\). \(E_i\) needs to meet the following conditions:

$$\begin{aligned} \cup _{i=1}^kE_i = E \end{aligned}$$
(1)
$$\begin{aligned} E_i\cap E_j = \emptyset , \forall i,j \in \{1,...,k\} \end{aligned}$$
(2)

If all the edges connecting to a vertex belong to the same partition, the vertex does not need to be cut. If all the edges connecting to the same vertices belong to \(m(1<m\le k)\) partitions, the vertices need to be cut \(m-1\) times. We use Count(v) to denote the number of different partitions that vertex v is cut. The number of cut times of each vertex in the graph is defined as follows:

$$\begin{aligned} w(v)=Count(v)-1, \forall v \in V \end{aligned}$$
(3)

The sum of cut times of all the vertices in a graph, denoted by W, is calculated by the following formula:

$$\begin{aligned} W=\sum _{i=1}^k w(v_i), \forall v_i \in V \end{aligned}$$
(4)

For various partitioning function, the value of W will be different. Now we can formulate an optimization problem that is to find a partitioning function P to get the minimumal W, and meet the following condition:

$$\begin{aligned} |E_i|=|E_j|, \forall i,j \in {1,...k} \end{aligned}$$
(5)

Note, in all practical cases the conditions are relaxed, such that it requires partitions of approximately equal size. This is important, because the number of edges of the graph is not necessarily a multiple of k. Therefore, throughout this paper, we address the relaxed version of the problem.

4 Solution

4.1 Motivation

In practical applications, we use adjacency matrix or adjacency list to represent a graph. Figure 2 shows an adjacency matrix according to a graph. In the matrix, vertices in the i-th row or vertices in the i-th column represent the relations between vertex \(v_i\) to all the other vertices in the graph. We use a square matrix \(\{a_{i,j}\}\) containing 0 s and 1 s indicates the adjacency matrix. \(a_{i,j}\) is 1 if there is an edge from \(v_i\) to \(v_j\), and 0 otherwise. Each edge in the graph can be uniquely mapped to one value 1 in the corresponding square matrix.

Fig. 2.
figure 2

Adjacency matrix to represent a graph.

Based on the edge partitioning approach, if two edges connect to the same vertex and are partitioned into two different partitions, the shared vertex will be cut once. As shown in Fig. 2, there are 4 shaded areas. The edges in different shaded areas do not have common vertices. Therefore, if we partition the edges in different shaded areas respectively into different partitions, no vertex will be cut. So we partition edges in graph according to the following steps.

  1. (i)

    Divide vertices in graph into k vertex sets sequentially, and each set corresponds to an edge partition.

  2. (ii)

    Partition the edges whose two vertices belong to the same vertex set (in the shaded areas) into the edge partition according to the vertices set.

  3. (iii)

    Partition the rest of edges into the edge partition according to any one of the two vertices linked to the edge.

From this process, we can find that if there were more edges whose two vertices belong to same vertices set the number of times vertices are cut would decrease and vice versa. Finally, the number of times vertices are cut is equal to the number of edges not in the shaded areas. So, it has become an effective way to reduce the number of times vertices are cut that how to divide vertices into k sets to increase the number of 1s in the shaded areas.

4.2 Vertex Swap

For a graph, we label each vertex by an unique order number based on our proposed methods. Different methods lead to different locations of 1 s in adjacency matrix. Swapping order number of two vertices is a way to change the method. For example, in Fig. 3 we swap order number of vertex 2 and vertex 3. Correspondingly, the 2nd row in the adjacency matrix will be exchanged with the 3rd row. Simultaneously, the 2nd column in the adjacency matrix will be exchanged with the 3rd column. After swapping, the number of 1s in shaded area increases by 2. In this way, the number of times vertices are cut will decrease. So, effective swap will increase the number of 1s in shaded area and decrease the number of times vertices are cut.

Fig. 3.
figure 3

Adjacency matrix with swapping order number of vertex 2 and vertex 3.

We define the area in the adjacency matrix as a target area which is shown as the shaded area in the Fig. 3. The target area, denoted by \(T_{square}\), is a set of \(a_{ij}\) in the matrix. We use the following formula to calculate \(T_{square}\) where k is the number of edge partitions:

$$\begin{aligned} s=\lceil \frac{|V|}{k} \rceil \end{aligned}$$
(6)
$$\begin{aligned} T_{square} = \{a_{ij}\}, \forall i,j \in \{1,...,k\}, \lfloor i/s \rfloor =\lfloor j/s \rfloor \end{aligned}$$
(7)

We use \(countones(T_{square})\) to count the number of 1s in \(T_{square}\). For two vertices, we will swap the order number of them if \(countones(T_{square})\) decreases. We call this heuristic square. According to this heuristic technique, we can select two vertices randomly and decide whether to swap them or not.

After several rounds of swapping, we partition the edges whose two vertices belong to same vertices set into the edge partition according to the vertices set and partition the rest of edges into the edge partition according to any one of the two vertices linked to the edge. We call this process as algorithm VSEP-square which is shown in Algorithm 1, where r indicates the number of iterations that can be set.

figure a

When swapping order number of two vertices, the elements in the corresponding two rows and two columns will be modified but the other elements will not. In practical applications, we just need to use an array to record the successful swap. At the beginning, the array is assigned to 1, ...|V| sequentially. When swapping, we just need to exchange two corresponding values in the array. So, this algorithm only need to access the graph without modification and only need to read/write the array.

This approach is very suitable for distributed parallel computing. In a distributed computing system like Spark and GraphLab, the approach uses one computing unit to create many pairs of difference vertices and distribute these pairs to other computing units to decide whether the pair of vertices should swap. And then the computing unit collects the results and modifies the array. Repeat this process if necessary.

4.3 Balanced Partitions

In VSEP-square, target area is k equal-sized squares. This means every square corresponds to equal size of order number range of vertices. The prerequisite is every order number range of vertices contains equal size edges. It meets the requirements while the edge in graph distribute in balance. Otherwise it will lead to partitioning a large number of edges into one partition. So that although the number of times vertices are cut will decrease, the size of each partition differs greatly which does not meet the premise.

Fig. 4.
figure 4

The target region in adjacency matrix by diagonal heuristic.

To solve this problem, we improve the target area into a zonal area along the diagonal in the matrix as shown in Fig. 4. The improved target area, denoted by \(T_{diagonal}\) is calculated by the following formula, where b is the size of each partition:

$$\begin{aligned} b=\lfloor \sqrt{\lceil \frac{|V|}{k} \rceil } \rfloor \end{aligned}$$
(8)
$$\begin{aligned} T_{diagonal} = \{a_{ij}\}, \forall i,j \in \{1,...k\}, |i-j|\le b \end{aligned}$$
(9)

So we do not need to segment range of vertices into k equal-sized partitions. We use \(countones(T_{diagonal})\) to count the number of 1s in \(T_{diagonal}\). For two vertices, we will swap the order number of them if \(countones(T_{diagonal})\) decreases. After several swaps, we traverse the elements row by row in the adjacency matrix and add the edge corresponding to 1 into a partition until the size of this partition reaches b. Then we add the edges into another partition. Finally, we will get k equal-sized partitions.We call this process as algorithm VSEP-diagonal which is shown in Algorithm 2.

figure b

5 Experiments

5.1 Metrics and Datasets

We measure the following three metrics to evaluate the quality of the partitioning:

  1. (i)

    Coefficient of Variation of Partition (CVP): this metric measures the degree of difference in sizes of partitions (in terms of the number of edges). The value is obtained by the Standard deviation divided by the average of the partitions. The higher value indicates greater difference between the different partitions of graph. When this value is 0, the size of different partitions is equal and achieve the best results.

  2. (ii)

    Vertex-cut-number: this metric counts the number of vertices has to be cut. If a vertices is not cut, it only exists in one partition. This ensures that communication is not required and does not need extra storage space to store which partition the vertices belongs to. Therefore, the number of vertices to be cut fewer, the less additional storage space and the less communication is required.

  3. (iii)

    Vertex-cut-times: this metric counts the number of times vertices are cut. That is, a vertex with one cut has replicas in two partitions, and a vertex with two cuts is replicated over three partitions. If a graph vertices are scattered over several partitions, every computation that involves a modification to a vertex, should be propagated to all the other replicas of that vertex, for the sake of consistency. Therefore, Vertex-cut-times directly affects the required communication cost of the partitioned graph.

We use four graphs of different nature and size for evaluating different algroithms. These graphs and some of their properties are listed in Table 1. Note, graphs Astroph and Email-Enron have power-law degree distribution.

Table 1. Description of testing practical graphs.

5.2 Comparisons with the State-of-the-Art Algorithm

We compare VSEP-square and VSEP-diagonal with the state-of-the-art algorithm JA-BE-JA-VC [11]. We configure JA-BE-JA-VC with parameter \(T_0=2\) and \(delta=0.005\) as they set in the experiment in [11]. We configure VSEP-square and VSEP-diagonal with parameter \(r=\lfloor |V|/2\rfloor \) to iterate swap process r times equaled to the half of vertex number in the graph. We use these three algorithms to partition the four real graphs into 2,4,8,16,32,64 partitions to measure the three metrics mentioned above.

Table 2 shows the CVP of JA-BE-JA-VC and VSEP-diagonal over graphs. Table 3 shows the CVP of VSEP-square over graphs. Since the limit size of edges in each partition is set in advance, the CVP of JA-BE-JA-VC and VSEP-diagonal partition the graph equally as shown in Table 2. However, VSEP-square does not limit the size. This leads to that CVP in Table 3 are greater than CVP in Table 2 in the corresponding positions. For Data and 4elf which are not power-law, CVP of VSEP-square is slightly greater than the CVP of JA-BE-JA-VC and VSEP-diagonal. For Astroph and Emall-Enron which are power-law, CVP of square is highly greater than the CVP of JA-BE-JA-VC and VSEP-diagonal. Because the edge distribution in power-law graphs is much more unbalance than no power-law graphs.

Table 2. CVP of JA-BE-JA-VC and VSEP-diagonal over graphs with different number of partitions.
Table 3. CVP of VSEP-square over graphs with different number of partitions.

Figure 5 shows the Vertex-cut-number of these three algorithms over graphs with different number of partitions. By the experimental results in Fig. 5, we found that the Vertex-cut-number of JA-BE-JA-VC is the most and VSEP-square is the least among all the algorithms in all cases. VSEP-square, compared with JA-BE-JA, reduces the Vertex-cut-number \(36.82\,\%\) in Data, \(17.81\,\%\) in 4elt, \(21.55\,\%\) in Astroph and \(51.50\,\%\) in Emall-Enron when k is equal to 64. VSEP-diagonal, compared with JA-BE-JA-VC, reduces the Vertex-cut-number \(15.38\,\%\) in Data, \(13.45\,\%\) in 4elt, \(9.15\,\%\) in Astroph and \(30.53\,\%\) in Emall-Enron when k is equal to 64.

Fig. 5.
figure 5

Vertex-cut-number over graphs with different number of partitions.

Figure 6 shows the Vertex-cut-times of these three algorithms over graphs with different number of partitions. By the experimental results in Fig. 5, we found that the Vertex-cut-times of JA-BE-JA-VC is the most and VSEP-square is the least among all the algorithms in all cases as the same result as above. VSEP-square, compared with JA-BE-JA-VC, reduces the Vertex-cut-times \(45.60\,\%\) in Data, \(30.13\,\%\) in 4elt, \(26.52\,\%\) in Astroph and \(41.62\,\%\) in Emall-Enron when k is equal to 64. VSEP-diagonal, compared with JA-BE-JA-VC, reduces the Vertex-cut-times \(21.05\,\%\) in Data, \(9.48\,\%\) in 4elt, \(13.94\,\%\) in Astroph and \(26.26\,\%\) in Emall-Enron when k is equal to 64.

Fig. 6.
figure 6

Vertex-cut-times over graphs with different number of partitions.

Through all the above experiments, we can draw the following conclusions based on these datasets. VSEP-square achieves the best in Vertex-cut-number and Vertex-cut-times, but the worst in CVP. CVP of VSEP-square is close to the other algorithms over no power-law graph. Over power-law graph, CVP of VSEP-square is much worse than the others. Vertex-cut-number and Vertex-cut-times of VSEP-diagonal are better than JA-BE-JA-VC while these CVP of these two algorithms are equal.

6 Conclusion

We presented a parallel edge partitioning algorithm, named as VSEP (Vertex Swap and Edge Partitioning). This approach only needs to access the graph without modification by using an array to record the process of swapping. So it is very suitable for running on distributed computing system like Spark and GraphLab. VSEP uses two heuristic methods, VSEP-square and VSEP-diagonal, to compute edge partitioning iteratively. We compared our algorithm with JA-BE-JA-VC, a state-of-the-art algorithm. Experimental results show that our VSEP-diagonal reduces the number of vertex has to be cut by \(9.15\,\%\sim 30.53\,\%\) and the number of vertex cut times by \(9.48\,\%\sim 20.26\,\%\) while retains the scale balance comparing with JA-BE-JA-VC.