Keywords

1 Introduction

A hypergraph consists of vertices and hyperedges that can connect multiple vertices, and can be seen as a general form of ordinary graphs. Since hypergraphs can effectively simulate complex intergroup relationships between entities, they have a wide range of applications such as bioinformatics [5] and social network analysis [9] . Specifically, complex analyses over hypergraphs have also been extensively explored for hypergraph motifs [6], classification [4] and hyperedge prediction [10]. Network motifs have achieved great success in exploring and discovering local structural features of real-world graphs [7]. However, due to the different structures of ordinary graphs between hypergraphs, it is difficult to directly apply related techniques to hypergraphs. In order to better explore the local structural patterns of real-world hypergraphs, Lee et al. [6] successfully define hypergraph motifs for the first time. Existing methods demonstrate the importance of hypergraph motifs in revealing hypergraph local structural patterns. However, existing algorithms do not effectively explore hyperedge relations to improve the computational efficiency. This motivates us to fully explore hyperedge relations (intersection and containment) to achieve the acceleration of hypergraph motifs counting. The major contributions are concluded as follows.

  • We explore the widely existing hyperedge relations in real-world hypergraphs and classify hypergraph motifs according to specific relations. For different types of motifs, by using set theory, we study and demonstrate different optimal calculation methods to reduce the cost of excessive intersections.

  • For the remaining hypergraph motifs that cannot be optimized, we further reduce the cost of the algorithm by preserving the hyperedge intersection when constructing the hyperdege projected graph.

  • We conduct extensive experiments to verify that our algorithm outperforms existing algorithms. In total processing time, our algorithm is more than two times faster than existing algorithms.

2 Related Work

We examine existing related work on network motif counting for ordinary graph. Most of them apply the following three techniques to speed up motif counting: 1) Combinatorics: In order to speed up exact network motif counting, the existing work [8] adopt combinatorial relations computation methods. 2) MCMC sampling: Most approximate network motif counting algorithms estimate the number of motif instances by sampling [2, 3]. 3) Color coding: The approximate network motif counting algorithm [1] uses color coding to randomly color each vertex and use this coloring information to randomly sample. However, due to the different structures of ordinary graphs and hypergraphs, it is difficult to directly apply related techniques to hypergraphs. We also review existing related work on hypergraph motifs. Hypergraph motifs are the basic building blocks of hypergraphs as defined by [6]. Unlike network motifs, it is formed by three connected hyperedges with 26 different connection patterns. Hypergraph motifs differ from network motifs in that they do not limit the number of vertices. Extensive experiments verify that hypergraph motifs play an important role in revealing local structural patterns of real-world hypergraphs. The only existing exact hypergraph motif counting algorithm is proposed by [6]. Although the algorithm efficiently implements hypergraph motif counting, it performs a lot of redundant intersection computations.

3 Hypergraph Motif Classification and Computation Acceleration Based on Hyperedge Relations

3.1 Basic Definition

Definition 1

(Hypergraph). A hypergraph is represented by G = (V, E), where V is a finite set of vertices, E = \(\bigcup _{i=1}^{|E|}\)e\(_i\) is a finite set of hyperedges. Each hyperedge e\(_i\) \(\in \) E is a non-empty subset of V.

Definition 2

(Hyperdege Projected Graph). A hyperdege projected graph of G = (V, E) is an ordinary graph PG = (E, H), where H = {(e\(_i\),e\(_j\)) \(\mid \) e\(_i\) \(\cap \) e\(_j\) \(\ne \) \(\varnothing \)}. We use \(\overline{H}_{ij}\) to denote the intersection of e\(_i\) and e\(_j\), that is, \(\overline{H}_{ij}\) = {v\(_i\) \(\in \) V \(\mid \) v\(_i\) \(\in \) e\(_i\) \(\cap \) e\(_j\)}.

Definition 3

(Hypergraph Motif). Given three connected hyperedges {e\(_i\), e\(_j\), e\(_k\)}, hypergraph motifs are used to describe the connectivity patterns of the three connected hyperedges. Formally, a hypergraph motif is a binary vector of size 7 whose elements represent the emptiness of the following seven sets: (1) e\(_i\) \(\setminus \) e\(_j\) \(\setminus \) e\(_k\), (2) e\(_j\) \(\setminus \) e\(_k\) \(\setminus \) e\(_i\), (3) e\(_k\) \(\setminus \) e\(_i\) \(\setminus \) e\(_j\), (4) e\(_i\) \(\cap \) e\(_j\) \(\setminus \) e\(_k\), (5) e\(_j\) \(\cap \) e\(_k\) \(\setminus \) e\(_i\), (6) e\(_k\) \(\cap \) e\(_i\) \(\setminus \) e\(_j\) and (7) e\(_i\) \(\cap \) e\(_j\) \(\cap \) e\(_k\).

Fig. 1.
figure 1

Hypergraph motif and hypergraph motif instance

Fig. 2.
figure 2

The 26 hypergraph motifs

Example 1

As shown in Fig. 1(b), hypergraph motifs can be naturally represented in the Venn diagram. The three circles represent hyperedges e\(_i\), e\(_j\) and e\(_k\), respectively. The three circles are superimposed and divided into seven parts representing seven different sets. We usually use patterned parts to represent non-empty and white to represent empty. In fact, excluding symmetries and duplicated hyperedges, we can describe the pattern of all connected three hyperedges by means of 26 hypergraph motifs in Fig. 2. If the connectivity pattern of the three hyperedges corresponds to a particular hypergraph motif, we consider the three connected hyperedges as an instance of this hypergraph motif. As shown in Fig. 1, (a) is an instance of the hypergraph \(motif\ 6\). It is worth noting that \(motif\ 17\)–22 are open motifs in Fig. 2. More intuitively, the open motif is the one that has two hyperedges which are not connected. Obviously, given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) is a open motif, then\(\ |e_i \cap e_j \cap e_k|=0 \).

Definition 4

(Hypergraph Motif Counting). Hypergraph motif counting is to calculate the number of instances corresponding to 26 hypergraph motifs on a hypergraph.

3.2 Double-Single-Inclusion Motifs

Definition 5

(Double-Single-Inclusion Motifs). Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) satisfies any of the following three conditions \((1)\ |e_i \cap e_j|=|e_j \cap e_k|=|e_j|\); \((2)\ |e_j \cap e_k|=|e_i \cap e_k|=|e_k|\); \((3)\ |e_i \cap e_j|=|e_i \cap e_k|=|e_i|\), we call it a Double-Single-Inclusion Motif (DSI motif for short).

Example 2

As shown in Fig. 2, \(motif\ 1\) and \(motif\ 4\) are DSI motifs. More intuitively, the DSI motif is the one that has one hyperedge contained by the other two hyperedges.

Theorem 1

Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) is a DSI motif, there exist the following conclusions : \((1)\ if\ |e_i \cap e_j|=|e_j \cap e_k|=|e_j|\ then\ |e_i \cap e_j \cap e_k|=|e_j| \); \((2)\ if\ |e_j \cap e_k|=|e_i \cap e_k|=|e_k|\ then\ |e_i \cap e_j \cap e_k|=|e_k|\); \((3)\ if\ |e_i \cap e_j|=|e_i \cap e_k|=|e_i|\ then\ |e_i \cap e_j \cap e_k|=|e_i|\).

Proof

We first prove the conclusion (1). Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if \(|e_i \cap e_j|=|e_j \cap e_k|=|e_j|\), then e\(_i\) contains e\(_j\) and e\(_k\) also contains e\(_j\). Therefore, there is \(|e_i \cap e_j \cap e_k|\) = \(|e_j \cap e_k|\) = \(|e_j|\). Similarly, conclusions (2) and (3) can be proved. Theorem 1 is proved.

3.3 Single-Double-Inclusion Motifs

Definition 6

(Single-Double-Inclusion Motifs). Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) satisfies any of the following three conditions \((1)\ |e_i \cap e_j|=|e_j|\) and \(|e_i \cap e_k|=|e_k|\); \((2)\ |e_j \cap e_k|=|e_k|\) and \(|e_i \cap e_j|=|e_i|\); \((3)\ |e_j \cap e_k|=|e_j|\) and \(|e_i \cap e_k|=|e_i|\), we call it a Single-Double-Inclusion Motif (SDI motif for short).

Example 3

As shown in Fig. 2, \(motif\ 3\), \(motif\ 7\) and \(motif\ 8\) are SDI motifs. More intuitively, the SDI motif is the one that has one hyperedge containing the other two hyperedges.

Theorem 2

Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) is a DSI motif, there exist the following conclusions : \((1)\ if\ |e_i \cap e_j|=|e_j|\) and \(|e_i \cap e_k|=|e_k|\ then\ |e_i \cap e_j \cap e_k|=|e_j \cap e_k|\); \((2)\ if\ |e_j \cap e_k|=|e_k|\) and \(|e_i \cap e_j|=|e_i|\ then\ |e_i \cap e_j \cap e_k|=|e_j \cap e_k|\); \((3)\ if\ |e_j \cap e_k|=|e_j|\) and \(|e_i \cap e_k|=|e_i|\ then\ |e_i \cap e_j \cap e_k|=|e_i\cap e_j|\).

Proof

We first prove the conclusion (1). Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if \(|e_i \cap e_j|=|e_j|\) and \(|e_i \cap e_k|=|e_k|\), then e\(_i\) contains e\(_j\) and e\(_k\). Therefore, there is \(|e_i \cap e_j \cap e_k|\) = \(|(e_i \cap e_j)\cap (e_i \cap e_k)|\) = \(|e_j \cap e_k|\). Similarly, conclusions (2) and (3) can be proved. Theorem 2 is proved.

3.4 Single-Single-Inclusion Motifs

Definition 7

(Single-Single-Inclusion Motifs). Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) satisfies any of the following three conditions \((1)\ |e_i \cap e_j|=|e_j|\) and \(|e_i \cap e_k|\ne |e_k|\); \((2)\ |e_j \cap e_k|=|e_k|\) and \(|e_i \cap e_j|\ne |e_i|\); \((3)\ |e_j \cap e_k|=|e_j|\) and \(|e_i \cap e_k|\ne |e_i|\), we call it a Single-Single-Inclusion Motif (SSI motif for short).

Example 4

As shown in Fig. 2, \(motif\ 5\), \(motif\ 9\) and \(motif\ 10\) are SSI motifs. More intuitively, the SSI motif is the one that has one hyperedge containing only one of other two hyperedges.

Theorem 3

Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if their connection pattern (motif) is a SSI motif, there exist the following conclusions : \((1)\ if\ |e_i \cap e_j|=|e_j|\) and \(|e_i \cap e_k|\ne |e_k|\ then\ |e_i \cap e_j \cap e_k|=|e_j \cap e_k|\); \((2)\ if\ |e_j \cap e_k|=|e_k|\) and \(|e_i \cap e_j|\ne |e_i|\ then\ |e_i \cap e_j \cap e_k|=|e_j \cap e_k|\); \((3)\ if\ |e_j \cap e_k|=|e_j|\) and \(|e_i \cap e_k|\ne |e_i|\ then\ |e_i \cap e_j \cap e_k|=|e_i\cap e_j|\).

Proof

We first prove the conclusion (1). Given three hyperedges e\(_i\), e\(_j\) and e\(_k\), if \(|e_i \cap e_j|=|e_j|\) and \(|e_i \cap e_k|\ne |e_k|\), then e\(_i\) contains e\(_j\). Therefore, there is \(|e_i \cap e_j \cap e_k|\) = \(|(e_i \cap e_j) \cap e_k|\) = \(|e_j \cap e_k|\). Similarly, conclusions (2) and (3) can be proved. Theorem 3 is proved.

As described in Subsect. 3.23.4, we propose 3 different special motifs through set theory. We also exploit set theory to give and prove their respective special properties. By determining the type of motifs, we can speed up the computation for the corresponding motifs through Theorems 13.

4 Hypergraph Motif Counting Algorithm Framework Optimization

For the remaining hypergraph motifs that cannot be optimized, we further reduce the overall complexity of the algorithm by preserving the hyperedge pair intersections in the preprocessing stage. 1) Constructing Projected Graph. As a preprocessing step \((Lines\,1\)–7), Algorithm 1 builds a complete hyperedge projected graph for subsequent motif counting. It first clears H for recording hyperedge pairs \((Line\,1)\). Then it finds all neighbors of each hyperedge \(e_i\) \((Lines\,2\)–4). \(E_v\) is used to denote the set of all hyperedges containing the vertices v. Finally it stores the hyperedge pair in H \((Line\,6)\). At the same time, it pre-stores the intersection (set of vertices) of the corresponding hyperedge pairs in \(\overline{H}\) for computing acceleration \((Line\ 7)\). The time complexity of this preprocessing step is \(O(\sum _{(e_i,e_j)\in H}|e_i\cap e_j|)\). In fact, it needs to compute \(e_i\cap e_j\) to find the neighbor \(e_j\) of hyperedge \(e_i\), hence it does not affect the time complexity of the algorithm by pre-storing \(e_i\cap e_j\) in \(\overline{H}\). 2) Motif Counting. Algorithm 1 \((Lines\,8\)–12) first finds two neighbors of each hyperedge \(e_i\) to form a hyperedge triple \((Lines\,8\)–9). \(H_{e_{i}}\) is used to represent all neighbors of hyperedge \(e_i\) in PG. Then it determines whether the three hyperedges belong to a particular motif \((Line\,10)\). If the corresponding hyperedge triple belongs to DSI or SDI or SSI or open motif, we use the function \(\overline{h}(\{e_{i},e_{j},e_{k}\})\) to determine which motif it belongs to and accumulate at the corresponding position of M \((Line\,11)\). Since \(\overline{h}(\{e_{i},e_{j},e_{k}\})\) does not need to compute e\(_i\) \(\cap \) e\(_j\) \(\cap \) e\(_k\), the time complexity of \(\overline{h}(\{e_{i},e_{j},e_{k}\})\) is O(1). For the remaining motifs, we use function \(h(\{e_{i},e_{i},e_{i}\},\overline{H})\) to calculate \((Line\,12)\). Since the algorithm pre-stores the hyperedge pair intersections in \(\overline{H}\), the time complexity of \(h(\{e_{i},e_{i},e_{i}\},\overline{H})\) is \(O(min(|e_i\cap e_j|,|e_j \cap e_k|,|e_i \cap e_k|))\). In conclusion, the time complexity of our algorithm is better than that of existing algorithm (\(O(min(|e_i|,|e_j|,|e_i|))\) in [6]).

figure a

5 Experimental Settings and Results Analysis

5.1 Experimental Settings

1) Competitive Algorithms. The first is a native algorithm for hypergraph motif counting. This algorithm does not employ any optimization techniques. We call this algorithm HMC for short, and we regard HMC as a basic method. The second algorithm HMCO can be seen as HMC with optimization techniques only for the open motif and it is actually the exact motif algorithm in [6]. The third algorithm HMCA can be seen as HMC with optimization techniques for DSI, SDI, SSI and open motifs. Our algorithm HMCP can be seen as HMCA adding preserving intersections techniques for remaining hypergraph motifs.

2) Experiment Environment. We obtained the source code of HMCO from the authors of [6]. The compiler for compiling source code is \(g++\ 4.9.3 -O3\ flag\). We conduct all experiments on a PC machine with equipment of \(I\) \(ntel\ i5\ 3.20\) GHz and 16 GB RAM.

3) Metrics. We measure the execution time in milliseconds (ms).

4) Datasets. We use 8 real-world datasets (http://konect.cc/) to evaluate the algorithms. The specific information of all real-world datasets is given in Table 1.

Table 1. Real-World Datasets Statistics
Fig. 3.
figure 3

Total Processing Time on Different Datasets (Vary Algorithm).

5.2 Experimental Results Analysis

1) Total processing time. Figures 3(a)–(f) show the total time when processing the corresponding dataset. Based on the experimental results, we can obtain the following conclusions. 1) HMC performs the worst on all datasets, because it employs the brute force policy and lacks optimization method. 2) Simplifying the computation by considering only special motifs can also lead to better speedups. HMCA outperforms existing methods HMCO. This is because a large number of hyperedge inclusion relations are actually contained in the real-world hypergraph. Our optimization technique exploits these relationships to greatly reduce computational overhead. 3) HMCP always maintains the advantage on all datasets. The reason is twofold. One is to use Theorems 13 to reduce redundant intersection calculations. The second is that preserving the hyperedge pair intersections in the preprocessing stage provides speedup for computing the remaining hypergraph motifs. In general, HMCP is more than 2 times faster than existing method HMCO. In dataset wang-amazon, HMCP can bring a maximum speedup of four times.

2) Scalability. To test the scalability of our algorithm, we use larger datasets. By varying the number of edges added to the hypergraph, we compare the performance of the four algorithms as shown in Figs. 4(a)–(b). The conclusion is that HMCP has better scalability than other algorithms. This is because our algorithm fully considers the hyperedge relationship to provide speedup. It is worth noting that the degree of the hyperedge increases as the number of edges increases. This will lead to more hyperedge inclusion relations, so the advantage of HMCP is more obvious.

Fig. 4.
figure 4

Processing time on different datasets (vary number of edges)

6 Conclusion

In this paper, we propose effective techniques for accelerating hypergraph motif counting based on hyperedge relations. In our work, we classify hypergraph motifs with different hyperedge relations and demonstrate different optimization methods. For the remaining hypergraph motifs that cannot be optimized, we further reduce the overall complexity of the algorithm. Extensive experiments on real datasets show that our method is superior to the existing solutions.