Keywords

1 Introduction

The concept of representative pattern mining was proposed to overcome the fatal problems of previous traditional frequent pattern mining such as generating an excessive number of frequent patterns and degrading mining performance. Maximal frequent pattern mining [4, 11, 12], which is one of the major techniques in representative pattern mining, is known for extracting representative patterns more efficiently at the cost of accuracy in pattern restoration. Such an advantage has attracted development of various relevant applications such as bio data analysis [2, 9], uncertain data analysis [5], privacy preserving [6], distributed processing [7], social network analysis [10], and hypergraph dualization [8]. In this paper, we analyze characteristics of recent maximal frequent pattern mining techniques.

The remainder of this paper is as follows. Section 2 introduces the basic concept of frequent pattern mining and its important related works. Section 3 describes recent approaches of maximal frequent pattern mining. Section 4 finally concludes this paper.

2 Frequent Pattern Mining

Since the Apriori algorithm was devised [1], various frequent pattern mining approaches have been proposed. FP-Growth [3] is a tree-based approach that can solve the fatal problems of Apriori such as excessive database scans and candidate pattern creation. Its own tree structure, called FP-tree, and mining techniques have attracted many research attentions. As a result, a variety of variations and applications have been developed. From given databases, frequent pattern mining methods find all of the possible patterns such that their support values are not smaller than user-given minimum support threshold. Frequent patterns can be expressed as the following definitions.

Definition 1.

(Support of a pattern) A database, DB = {T 1, T 2, …, T n }, is composed of multiple transactions, Ts, and each transaction T k = {i 1, i 2, …, i m } also has a number of items, is. Then, pattern A = {i 1, i 2, …, i j }, which can be generated from DB, is a subset of at least one T. The support of A, Sup(A), is calculated as follows:

$$ Support(A) = \sum\limits_{i = 1}^{n} {f(A,T_{i} ),f(A,T_{i} ) = \left\{ {\begin{array}{*{20}c} {1,\,\text{if}\,A \subseteq T} \\ {0,\,otherwise} \\ \end{array} } \right.} $$
(1)

Definition 2.

(Frequent pattern) A measure, called minimum support threshold (denoted as δ), is set by a user and employed to judge whether or not a pattern is frequent. This measure is the product of a user-given percent value and the number of transactions in DB. Then, pattern A is considered as a frequent pattern if the following condition is satisfied:

$$ Support(A) \ge\updelta $$
(2)

Therefore, the main goal of frequent pattern mining is to find all of the possible patterns that satisfy the above condition from a given database.

However, they may extract an excessive number of pattern results depending on features of databases and threshold settings. Since it is difficult to analyze all the mined patterns, we need to consider another approach.

3 Recent Maximal Frequent Pattern Mining Techniques

Maximal frequent pattern mining is a method that mines a smaller number of representative patterns instead of extracting all of the possible frequent patterns. There are two ways to mine representative patterns: closed frequent pattern mining and maximal frequent pattern mining. Although they can extract pattern mining results that can represent frequent patterns, maximal frequent pattern mining can guarantee better pattern condensing effect. Closed frequent pattern mining guarantees complete pattern restoration from closed frequent patterns to original frequent patterns; however, its pattern condensing effect is worse than that of maximal frequent pattern mining. This paper focuses on recent techniques of maximal frequent pattern mining. The maximality feature of a pattern is defined as follows.

Definition 3.

(Maximal frequent pattern) Let X be a pattern, Γ = {X’ 1, X’ 2, …, X’ k } be a set of supersets of X, X’s, and δ be a user-given minimum support threshold. Then, X becomes a maximal frequent pattern if the following conditions are satisfied:

$$ Sup(X) \ge {\updelta} ,{\Upgamma} = \left\{ {X^{{\prime }} \left| {Sup(X^{{\prime }} )} \right. < {\updelta }} \right\} $$
(3)

Through these constraints, we can effectively reduce the number of generated patterns.

The maximality characteristic has the following effect. The left side of Fig. 1 shows an example of frequent patterns. The right side of Fig. 1 is a result of expressing the frequent patterns as representative ones using the maximality feature of them. As shown in the figure, a large number of frequent patterns can be expressed as a smaller number of maximal frequent patterns according to their support characteristics.

Fig. 1.
figure 1

Example of relations among frequent patterns and maximal frequent patterns

WFPmaxWA and WFPmaxSD [12] are tree-based algorithms that extract maximal frequent patterns considering weight factors of items. They use a recursive divide-and-conquer manner and employ different item sorting orders according to algorithm types. They can effectively be utilized in various areas dealing with static data. Theses algorithms scan a given database twice in order to extract weighted maximal frequent patterns. Their basic frameworks follow that of FP-Growth [3]. Therefore, in the process of the first database scan, they calculate a weight ascending order (in the case of WFPmaxWA) or support descending order (in the case of WFPmaxSD) from the given data and prune invalid items. In the second database scanning process, they construct their own tree structures and perform pattern growth works recursively. IM_WMFI [11] is a weighted maximal frequent pattern mining algorithm for processing incremental databases. In contrast to the above ones, the method can be employed effectively in environments where data are continually accumulated. It also follows a tree-based pattern growth manner. Additionally, IM_WMFI constructs its own tree structure and performs mining operations within a single database scan in order to handle such dynamic data efficiently. Since all of the necessary works for mining weighted maximal frequent patterns have to be conducted within a single database scan, the algorithm directly stores given data into its own tree structure at first, and then it performs additional tasks for tree restructuring. In order to increase efficiency of tree restructuring, IM_WMFI divides its tree (the entire task) into a number of paths (smaller tasks) and performs item resorting operations for each path (a divide-and-conquer manner). AWMax [4] is an approach integrating the concepts of approximate pattern mining and maximal frequent pattern mining. The method mines maximal frequent patterns considering error tolerance and weight conditions on noise environments. Accumulated data may have unexpected errors such as noise, device malfunction or failure, record error, etc. In these cases, AWMax can effectively extract weighted maximal frequent patterns considering error tolerance.

Table 1 shows major features of the aforementioned algorithms. According to types of given data and purposes of pattern mining results, users can select and utilize proper algorithms.

Table 1. Features of recent maximal frequent pattern mining algorithms

4 Conclusion

In this paper, we explained the concept and characteristics of maximal frequent pattern mining and conducted analysis of recent relevant methods. Extensive usability of maximal frequent pattern mining led to various techniques and applications. Therefore, users can select and employ appropriate algorithms according to their own situations. The maximality property of patterns can also effectively be applied in various areas that may cause excessive computational overheads such as graph pattern mining, distributed processing of big data, etc. We are scheduled to conduct such expanded studies in our future work.